Month: February 2016

Say goodbye to your Google Search Appliance and hello to Elasticsearch

Credit: Barabas/cc-by-sa-3.0
Photo: Barabas/cc-by-sa-3.0

Earlier this month, Google announced that it is getting out of the search appliance business. According to this article by Fortune, Google told its partners they could renew existing Google Search Appliance (GSA) customers through 2017 but no new hardware would be sold.

I have multiple clients running GSA for Enterprise Search and their experiences have been mixed. Clearly, the plug-and-play nature of a turnkey appliance was attractive. But, of course, the other side of that coin is the potential set of limitations that an appliance places on you, whether that’s in terms of cost/license, capacity, or features.

GSA customers have time to figure out their migration path. Google says they are working on a cloud-based alternative. But maybe it’s time to take a step back and consider your options.

Something big has happened since the last time you looked at Enterprise Search: It’s called Elasticsearch. The commercially-supported open source software builds on the rock solid foundation of the well-known Apache Lucene by baking in clustering and a comprehensive API out-of-the-box.

Adoption has been swift. At last year’s Elasticon conference, the company reported 20 million downloads. At this year’s conference the company announced they had hit 50 million downloads across all of their products.

Deployment options

If you want to self-support, you can set up a cluster on-prem and scale it as big as you need it for the cost of your time and some hardware. If you need commercial support you can get it from Elastic.

If a cloud-based solution is attractive to you there are several options:

  • Elastic has its own cloud offering called Elastic Cloud (formerly called “Found”).
  • QBox offers Elasticsearch hosting.
  • Amazon offers its own hosted Elasticsearch offering called Amazon ES.
  • And you can always just grab some virtual machines on your cloud provider of choice and install and run your own cluster.

The Elastic Stack provides the core search platform and a host of other tools, but it does not provide a web crawler. You’ll probably want to use Scrapy, StormCrawler, or Nutch for this, all of which are freely available as open source software.

Beyond crawlers there are a ton of different ways to get content indexed into Elasticsearch. Beats and Logstash are two Elastic products that can be used to pump data into the cluster. If you have to write your own integration, the API is fairly straightforward and is available for a number of languages as well as anything that can speak REST.

You’ll be shocked at how quickly you can stand up an Elasticsearch cluster. Where you’ll likely spend more time is on production-izing your setup and tuning for relevancy (take a look at the Relevant Search book from Manning).

Your GSA was only ever going to be good at one thing–providing keyword search for your internal documents. Elastic gives you that and so much more. You might start out using it to replace your GSA-based Enterprise Search but you’ll soon figure out that it can be used for all kinds of interesting things.

A simple one-way calendar integration for Alfresco Share

Photo credit: Dafne Cholet
Photo credit: Dafne Cholet

A common request is to integrate the Alfresco Share calendar with an external calendaring system such as Outlook, Google Calendar, or Zimbra. Without an integration, people end up doing double-entry. You’ve already got a calendar that works pretty well. Why make people re-enter events in Alfresco Share?

Most people use Alfresco Share for team collaboration. The calendar doesn’t need to show everything on everyone’s calendar–that job is better left to the existing calendar server. What makes more sense is to show a few team-related events or milestones on the team’s Alfresco Share site calendar or maybe in a dashlet on the site’s dashboard.

When thinking about the problem, I realized that the calendar in Share is just another interested party in an event. Just as some calendaring systems allow you to “invite” a conference room to a meeting which effectively reserves that room for the meeting, you ought to be able to “invite” a Share site and have the Share site add that event to its calendar and update it when the event changes.

Treating the Share site as just another invitee is a non-invasive way to integrate with the calendaring system and it has the added benefit that only events in which the Share site was specifically invited will show up on the Share site calendar.

As luck would have it, the pieces to make this work already exist and they don’t require any changes to the source calendaring system. Check it out:

  • When you invite someone to a calendar event the calendaring system sends an iCalendar (.ICS) file as an email attachment to the invitee. The invitee’s email or calendaring client recognizes that attachment and updates the calendar accordingly.
  • There’s a Java library called iCal4j that knows how to parse iCalendar files. Yea for standards!
  • Alfresco supports receiving inbound email and you can easily bind custom logic to the creation of nodes. Alfresco creates one document for the email body and one for the ICS file attachment.
  • Events that show up on the Alfresco Share calendar are just content-less objects–they are instances of ia:calendarEvent.

Put those pieces together and a simple one-way calendar integration is born. The integration watches for incoming email with ICS attachments, parses the attachment, then creates, updates, or deletes the corresponding Alfresco Share site calendar object.

With this in place, all you have to do to add an event to the Alfresco Share site calendar is invite the Share site to the event from your favorite calendaring system.

But what’s the invitee name of a Share site? Great question! In Alfresco, there’s an aspect called email alias. You can add it to any folder and give it an arbitrary value. Then, when sending email to Alfresco you can specify the alias.

My integration includes code that makes sure all Share sites have a folder that can be used to store inbound email and it gives that folder an alias equal to the Share site’s short name (which is used as part of the Share URL). So if your Share site is called “test-site-1” and you normally send email to Alfresco via alfresco.someco.com, your Share site’s email address becomes test-site-1@alfresco.someco.com.

What about updates? Calendar systems have a universal identifier for every event. When calendar entries are updated or deleted, the calendaring system sends an iCalendar file just as it does for new events. Included in that file is the event’s unique ID and a flag that indicates whether the event is being created or deleted. When the integration creates the event in the Alfresco Share calendar, it stores the unique ID in the Alfresco object’s metadata which it can use later to match up subsequent update and delete requests.

How about a demo?

This video shows the integration in action. Be sure to make it full screen and select “HD”.

(If you can’t see the video, watch it on YouTube here).

What’s left to do?

This is a simple, one-way integration. It does not tell the corporate calendaring system which sites are available and it does not do a free-busy lookup. It also does not acknowledge the invitation back to the source calendaring system. I don’t consider these to be critical gaps but those features might make the integration tighter.

As a side-note, the automatic creation of an email alias for a Share site and a corresponding folder to hold inbound email (which users could then configure rules for) might be useful as a separate add-on even if you don’t need calendar integration. If you agree, let me know. Maybe the integration ought to be split into two separate AMPs.

Pull requests welcome

As usual, I welcome your participation on this project. If you find problems, fix problems, or want to make improvements, use the github project to create issues and pull requests.

Elasticon 2016 is only two weeks away

elastic_logo_color_horizontalElasticon 2016 is just around the corner. The annual conference covering all things Elastic is happening February 17 – 19 in San Francisco.

Last year, the buzz was all about Elasticsearch 2.0. Attendees learned a lot about what to expect with that release. But my favorites were the sessions that covered real world implementations. Some of these included:

  • How the U.S. Geological Survey uses Elasticsearch to be notified of earthquakes as they happen by monitoring and analyzing social media.
  • Verizon’s best practices around scalability–they have 128 nodes indexing 10 billion documents per day.
  • Goldman Sachs was another big one–at that time they were running 700 nodes.
  • Interesting case studies from Wikimedia, Quizlet, Zen Desk.
  • Focus on analysis challenges from the team that runs Elasticsearch to provide web search for 1500 dot gov web sites such as the NIH and the U.S. Army.

Beyond informative sessions, you can learn a lot in the hallway track. At last year’s conference there were 1300 attendees from 32 different countries. I met people from both ends of the business spectrum doing all sorts of different things with Elasticsearch and the rest of the ELK stack.

This year’s agenda looks pretty interesting. I’m looking forward to the roadmap sessions, of course, but it’s the sessions from folks like Thomson Reuters, Yammer, HotelTonight, Eventbrite, Etsy, The New York Times, and Adobe that will probably give me the most bang for my buck. It only takes a few key insights here and there to pay for the entire trip.

Amazingly, this year’s conference has not sold out yet. Grab a spot and join us. Today is the last day for the discounted rate.

Register now for BeeCon, the Alfresco Community Conference

Order of the BeeRegistration for BeeCon 2016 is now open. What the heck is BeeCon? BeeCon is the first-ever, independently-organized conference focused entirely on Alfresco. The BeeCon web site says it best:

Alfresco professionals and enthusiasts come to BeeCon to sharpen their technical skills and collaborate with other experts…Whether you are a developer, information professional, student, or Alfresco employee, BeeCon is the place to dive deep into Alfresco and develop the relationships which you will need to be successful in the coming year.

The conference is organized by the Order of the Bee, an independent community focused on Alfresco.

Who Will Attend?

BeeCon is an event organized by and targeted towards the Alfresco community. It is built around the idea that what makes our community great is its open, collaborative spirit. And that, from time-to-time, it is important to meet face-to-face to learn from each other, hash out ideas, strengthen personal relationships, and just have fun.

If Alfresco is just a piece of software to you, then this is a conference with a lot of technical how-to’s that will help you get your project done, and you should come for that reason. When you arrive, though, you’re going to find out that a lot of people have crossed oceans and continents to be in Brussels because not only is the software important, but because, as a community, we have a lot of work to do. And the people who care about the Alfresco community are using this event to get organized and to map the way forward.

If you love sales pitches and marketing fluff you should sit this one out. But if you…

  • want to learn more about the technical details from experts;
  • are already running Alfresco in your organization, whether that’s Enterprise or Community Edition; or
  • want to help shape the future of the community and the platform

…then you need to attend BeeCon 2016.

More than a Meetup

This is more than a meetup. It’s a real two-day conference with keynotes, tracks, and a hack-a-thon. The goal is to make it similar to past events like DevCon with really great content and outstanding people, but without the big budget (or price tag).

You can register now for about 60 Euros. If you wait the price goes up to about 90 Euros.

Support from Alfresco and Other Sponsors

The BeeCon team has focused on keeping things practical and inexpensive. But events like this simply cannot succeed without help from sponsors. This year, CIRB-CIBG is providing the venue, A/V equipment, and WiFi, which is amazing because those three items are the biggest in terms of cost for any event. What’s even more amazing is that we enjoy additional support from a number of sponsors including Alfresco, Contezza, ITD Systems, keensoft, VDEL, and Xenit. You should thank these folks when you see them.

Stay Tuned for the Detailed Agenda

The program team received a number of speaking submissions from Alfresco engineers and community members from all over the world. They are busy reviewing those and will get the conference web site updated as things solidify. The team is picky–they want sessions to be high quality and packed with information you can use on your Alfresco projects right away. I’m looking forward to seeing the finished agenda, but I’m not going to wait to register.

Space is Limited, Do Not Wait to Register!

While you’re thinking about it, complete your registration. It’s only 60 Euros. I’ll bet you can slip that into an expense report without much fuss. And when you bring the things you learn back to the office, you’ll win respect and adoration from your boss and coworkers. Not bad for 60 Euros.

When making your travel plans for Brussels, remember that we’ll be getting together Wednesday night, April 27, for a welcome reception. The conference runs two days, April 28-29. Then, whomever is interested can come with us to the medieval city of Bruges on Saturday, April 30, for a day of sightseeing. I’ve been to Bruges–it’s gorgeous. You won’t want to miss it. Plus, it will be nice to hang out with your favorite community members, Belgian-style.

I look forward to seeing you in Brussels in April!