DevOps | ECM Architect

I love virtual machines and containers because they make it easy to isolate the applications and dependencies I’m using for a particular project. Tools like Docker, Virtualbox, and vagrant are indispensable for most of my projects and I’m still using those, but in this post I’ll describe a product called Antsle which has given me additional flexibility and has freed up some local resources.

My daily developer workstation is a MacBook Pro with 16 GB of RAM and a 500 GB SSD. From a memory and CPU perspective, it can handle running a handful of virtual machines simultaneously without a problem. But disk space is starting to be an issue.

I use vagrant and Ansible to make virtual machine provisioning repeatable–I can delete any VM at any time without remorse because I can always recreate it easily. But I get tired of continually cleaning up machines and pruning back base boxes just to reclaim space.

I decided to do something about it. My options were:

When Apple releases a MacBook Pro that can take 32 GB of RAM, buy that with at least 1 TB SSD, then continue with my current toolset.
Buy a Mac Pro or some other desktop to use exclusively for virtual machines.
Buy or build an actual server and set it up with virtualization. Something like this, for example.
Use AWS for my development virtual machines.

Then I came across a little company based out of San Diego called Antsle. Antsle builds virtualization appliances. What makes their product attractive to me versus buying a workstation or server or building my own is that:

The machines have no fan or other moving parts–they are completely silent. The case acts as a heat sink.
The machines are energy-efficient. The docs say mine will run at 45 watts.
They are built on Linux with standard virtualization technology (LXC and KVM) plus some additional optimizations from Antsle.
They are ready-to-go out-of-the-box, saving me the time and effort of building my own solution.

I really like using AWS, and I think for production workloads, no one, not even your own internal IT data center, can do it cheaper or more securely. Plus the breadth of their service offering is nuts. But for my modest developer needs, I’m pretty sure I’ll break even within a year, and that’s not counting the productivity gain of not having to wait for instances to spin up or having to fool with the complexity of the AWS console.

So, after that analysis, I was ready to buy. The biggest struggle was to decide which model to buy and whether or not to do any upgrades. I went for an Ultra, which has an 8-core 2.4 GHz Intel processor, 32 GB of ECC RAM and two Samsung EVO 850 1 TB SSD drives. The drives are mirrored so that’s 1 TB of space. I could have expanded the RAM to 64 GB and increased the storage up to 16 TB, but it was hard to justify the added expense based on my needs.

My Antsle arrived last week and I’ve been pretty happy with it so far. I’ve got a set of “base” images created so that I can easily instantiate new machines based on typical components and configuration. For example, I have an image for every recent Alfresco release. When I need to work on one for a client project or to help someone in the forums, I can just clone one of my base images and start it up. I can let it run as long as I want without worrying about cost, and then kill it or keep it around as needed.

Here is a summary of my experience, thus far:

No setup necessary. I plugged it in, started it up, and was starting up machines in minutes.
Creating machines from templates, cloning machines, taking snapshots, and startup/shutdown happens very quickly.
Templates and instantiated machines take up less space than I would have thought, which is great. So far, I’m glad I stuck with the base storage option.
I haven’t pegged the CPU yet, but I have seen it spike briefly to as high as 50%, and that was when I was only running a single VM. I continue to see brief spikes here and there, but as I won’t have too many machines under load at any given time so I’m not that worried about it yet.
Documentation seems thorough and helpful. The company has been really responsive and helpful so far as well. They responded to a minor billing issue quickly and resolved it without a fuss.
I noticed when you clone a machine that has a bridged network adapter, the MAC address doesn’t change. You have to drop and re-add the NIC if you want a new MAC address, otherwise DHCP will assign it the same IP address as the original machine. This isn’t a big deal once you know the behavior.
I had to change the vm.max_mem_map setting to make Elasticsearch happy, which is a typical setup task for Elastic. It took me a minute to realize that needs to be done on the Antsle host and applies to all guests–it cannot be done on the individual VM, at least for LXC.
There does not appear to be a way to tag or comment on virtual machines. Additionally, the name you assign to each image is fixed-length and fairly short. So I’m somewhat concerned that, as my library grows, I’ll start to lose track of what’s installed on which machine. AntMan, the management console, seems to be evolving fairly rapidly so maybe this will change in a future release.

I’ve also created a few videos if you want to see it in action.

This video is the unboxing.

This video shows an Ubuntu and a CentOS image being created and then configured for bridged networking.

This video shows how image templates work and gives you a little bit of a feel for the performance using a real-world app (in this case, Alfresco running on CentOS) while other machines are running simultaneously (one Mail/LDAP machine and a four-node Elastic cluster).

Almost all of my client work is remote. For many projects, that means chat is an essential communication tool. When you and your team essentially live in a chat window it’s nice when your tools can participate in the conversation. Luckily, it’s pretty easy to wire this up. Let me show you how I did it for a recent Elasticsearch project.

Openfire: An open source chat server

Today, hosted chat services like Slack and HipChat get all of the attention. The approach I outline in this blog post will work with those tools too, but on this particular project we’re running an open source chat server on-prem called Openfire. Openfire has been around for a long time. I like it because it is open source, easy to install, and will run anywhere you can run Java.

Because it implements an open protocol called XMPP (aka Jabber) there are a variety of chat clients that will work with it. Openfire ships a web-based client called Spark and some of my teammates use that, but most of the time I use Adium on my Mac.

If you need help installing Openfire, take a look at the docs.

Inbound and outbound integration with Elasticsearch

Once your chat solution is working, it’s time to integrate it with Elasticsearch. For my requirements I needed two “directions” for this integration. First, I wanted to be able interrogate one or more of my Elasticsearch clusters from within chat. This “outbound” integration requires a “bot”. There are many open source bots to choose from and examples of bot scripts working with Elasticsearch. I’ll cover both shortly.

The other direction I needed was “inbound”–I wanted my Elasticsearch cluster to be able to tell the chat server when something is wrong with the cluster. This requires something to monitor the health of the cluster (we use Watcher, a paid add-on from Elastic) and a web hook that can use the chat server API to send messages.

Let me cover the outbound implementation–the bot–first. Then I’ll talk about Watcher and the web hook which make up the inbound implementation.

Hubot: An open source chat bot from Github

There are a number of chat bots out there. I went with Hubot from Github. Hubot is based on Node.js. Hubot scripts are written in Coffeescript. However, if you are new to Node or Coffeescript there are plenty of examples out there so don’t let that stop you from using Hubot.

I used this blog post to get Hubot working. However, there were a few gotchas I should point out:

I had to use an old version of node.js (0.10.23). The newer version was having a lot of trouble with one of its dependencies and I got tired of fooling with it.
The blog post lists some Linux dependencies you need to install, but it leaves one out that’s critical: libicu. On Centos this is libicu-devel and on Ubuntu it is libicu-dev.
The blog post specifies some environment variables that need to be set. If you are running Hubot with Openfire, the HUBOT_XMPP_ROOMS variable needs to be set to the fully-qualified conference room name. For example, if the Hubot username is “hubot” running on a host named “grumpy” the variable should be set to hubot@conference.grumpy.
You may have to set HUBOT_XMPP_HOST to the hostname of your Openfire server.

Other than that, you should be able to use that blog post to get Hubot and Openfire working.

Hubot and Elasticsearch

There are Hubot scripts that do all sorts of stuff. One of the fun things about adding a bot to your chatroom is to have it do something silly. Maybe every time someone uses the word “Dude” the bot throws out a quote from the Big Lebowski, for example. So you’ll see lots of stuff like that. But there are also more useful examples out there. Here is the one I started with. The hubot-elasticsearch script knows how to use the Elasticsearch API to spit out information about nodes, indices, allocation, and settings. And it allows you to alias your clusters so you don’t have to constantly tell the bot what your URL endpoints are.

Out-of-the-box, the hubot-elasticsearch project is not compatible with Shield, but it’s a decent start. I made a small tweak to get it to work with Watcher, which I’ll cover next.

Watcher: Monitoring and Alerting for Elasticsearch

This particular client is a paying customer of Elastic, which means they are entitled to paid-only add-ons such as Shield (secures the cluster) and Watcher (for monitoring and alerting).

Watcher is pretty handy and we’re glad to have it, but if you aren’t able to use it for some reason, writing your own tool for running tasks on a schedule isn’t too tough. I wrote something similar using Spring MVC and Quartz, for example. You just need something that will periodically interrogate the cluster and then take some action based on some condition. But if you are an Elastic customer there’s no need to build it. The rest of the post assumes that’s the case.

I’ll let you read the Watcher docs to learn more, but at a high level, a watch consists of a trigger, an input, a condition, and an action. The trigger is the schedule. The input might be an Elasticsearch query or the response from some random HTTP endpoint. The condition looks at the input and then decides whether or not action is needed. The action taken might be to send an email, create some data in Elasticsearch, or invoke a web hook.

For my needs, the web hook action is perfect–if one of my watch conditions is met, like maybe something goes wrong with my cluster and the cluster state goes to red, Watcher will invoke my web hook which will post a message in the chat room. Here’s what the action part of my watch definition looks like:

"actions": {
    "notify_chat": {
        "webhook": {
            "method": "POST",
            "host": "localhost",
            "port": 8008,
            "path": "/chat",
            "headers": {
                "Content-Type": "application/json"
            },
            "body": "cluster_health alert: Someone needs to look at the DEV cluster. It appears to be in a RED state."
        }
    }
}

Watcher can have any number of actions listed for a given watch. In this case, I’m using a single “webhook” action called “notify_chat” that does a POST to a URL running on port 8008. That URL could be anything, and it can include basic authentication.

Web Hook: Spring Boot, Spring MVC, and Smack

I’ve been using Spring Boot lately when I need to knock out a quick RESTful API. In this case, I just needed something to listen to the “/chat” end point. When it is called, the code grabs the message posted to it and uses the Smack API to connect to the chat room and post the message. This webapp is probably less than 10 lines of code and Spring Boot packages it up nicely for me.

If you need help with this part take a look at the Smack API Multi User Chat docs.

Tweaking the bot to allow watch acknowledgement

Watcher can throttle or suppress actions based on a time period (“Don’t tell me about this condition again for 5 minutes,” for example) or explicit acknowledgement. If a watch is triggered that uses explicit acknowledgement, I want to be able to acknowledge that from within chat. You already saw that the Elasticsearch Hubot script can talk to the cluster. It’s pretty easy to tweak the script to allow Watcher acknowledgement.

First, I added a function called “ackWatch” that actually does the work of acknowledging the watch:

ackWatch = (msg, watch_id, alias) ->
  cluster_url = _esAliases[alias]

  if cluster_url == "" || cluster_url == undefined
    msg.send("Do not recognize the cluster alias: #{alias}")
  else
    msg.send("Acknowledging watch: #{watch_id}")

  msg.http("#{cluster_url}/_watcher/watch/#{watch_id}/_ack")
    .put() (err, res, body) ->
      msg.send("Acknowledged")

Then, I added the regular expression that the bot should be listening for:

robot.hear /elasticsearch ack (.*) (.*)/i, (msg) ->
  if msg.message.user.id is robot.name
    return

  ackWatch msg, msg.match[1], msg.match[2], (text) ->
    msg.send text

With that in place, any user in the chat room can acknowledge a watch by typing, “hubot: elasticsearch ack some_watch some_alias” where some_watch is the ID of a watch and some_alias is the nickname for the cluster you’re talking about (like “dev”, “qa”, or “prod”, for example).

Putting it all together: A short demo

With all of this in place, my Elasticsearch clusters can tell the team when something interesting is going on and the team can acknowledge that alert and do preliminary investigation by interrogating the cluster, all from the comfort of their chat window.

The video below shows this working. In it, I create a simple watch that invokes a web hook to post a message to the chat room when a watch condition is met.

The demo uses a simple example where the alert is triggered when the test index is a certain size. But you could easily wire up any watch to the same action, such as when your cluster state goes red or when CPU or RAM reach a certain threshold.

This was relatively simple to put together, but hopefully you can see how you could build on this to automate all kinds of things related to monitoring, alerting, and administration of your Elasticsearch cluster from chat.

ECM Architect

Tag: DevOps

My initial experience with Antsle, a virtual machine appliance

Using Hubot and Watcher to automate Elasticsearch admin tasks via chat