Tag: Intranet

Say goodbye to your Google Search Appliance and hello to Elasticsearch

Credit: Barabas/cc-by-sa-3.0
Photo: Barabas/cc-by-sa-3.0

Earlier this month, Google announced that it is getting out of the search appliance business. According to this article by Fortune, Google told its partners they could renew existing Google Search Appliance (GSA) customers through 2017 but no new hardware would be sold.

I have multiple clients running GSA for Enterprise Search and their experiences have been mixed. Clearly, the plug-and-play nature of a turnkey appliance was attractive. But, of course, the other side of that coin is the potential set of limitations that an appliance places on you, whether that’s in terms of cost/license, capacity, or features.

GSA customers have time to figure out their migration path. Google says they are working on a cloud-based alternative. But maybe it’s time to take a step back and consider your options.

Something big has happened since the last time you looked at Enterprise Search: It’s called Elasticsearch. The commercially-supported open source software builds on the rock solid foundation of the well-known Apache Lucene by baking in clustering and a comprehensive API out-of-the-box.

Adoption has been swift. At last year’s Elasticon conference, the company reported 20 million downloads. At this year’s conference the company announced they had hit 50 million downloads across all of their products.

Deployment options

If you want to self-support, you can set up a cluster on-prem and scale it as big as you need it for the cost of your time and some hardware. If you need commercial support you can get it from Elastic.

If a cloud-based solution is attractive to you there are several options:

  • Elastic has its own cloud offering called Elastic Cloud (formerly called “Found”).
  • QBox offers Elasticsearch hosting.
  • Amazon offers its own hosted Elasticsearch offering called Amazon ES.
  • And you can always just grab some virtual machines on your cloud provider of choice and install and run your own cluster.

The Elastic Stack provides the core search platform and a host of other tools, but it does not provide a web crawler. You’ll probably want to use Scrapy, StormCrawler, or Nutch for this, all of which are freely available as open source software.

Beyond crawlers there are a ton of different ways to get content indexed into Elasticsearch. Beats and Logstash are two Elastic products that can be used to pump data into the cluster. If you have to write your own integration, the API is fairly straightforward and is available for a number of languages as well as anything that can speak REST.

You’ll be shocked at how quickly you can stand up an Elasticsearch cluster. Where you’ll likely spend more time is on production-izing your setup and tuning for relevancy (take a look at the Relevant Search book from Manning).

Your GSA was only ever going to be good at one thing–providing keyword search for your internal documents. Elastic gives you that and so much more. You might start out using it to replace your GSA-based Enterprise Search but you’ll soon figure out that it can be used for all kinds of interesting things.

Django + Alfresco was a winning combination for retailer’s intranet

Last week I spent some time with one of our clients talking about what it’s been like to live with their Intranet platform based on Django and Alfresco. The conversation got me really excited about what they’ve been able to do since the original implementation and where they are heading.

The client is a well-known, high-end retailer based in Dallas. About a year ago they engaged Optaros to replatform their intranet from a legacy Java portal product to something more agile. They had seen Alfresco and liked it as a core repository, but needed something for the presentation tier (See “Alfresco User Interface: What are my options?“).

The Optaros team worked with the client to consider many options, including open source Java portal servers. The client felt like they needed something lighter and more flexible than a portal server. They were willing to do a lot of the presentation work themselves in exchange for complete design freedom and yet still be enough of a framework to be highly productive. The winning solution turned out to be Django.

Python? No problem.

I was initially worried that introducing a Python-based framework into a Java shop was going to be a problem but they weren’t married to Java. Our team got them up-to-speed quickly and they never looked back. It also helped that the client’s intranet sites were very communication-centric which matched up well with Django’s newspaper heritage.

Here’s how they use the solution in a nutshell:

  • Content owners use Alfresco Explorer to upload HTML chunks, office documents, and images, set metadata, and submit content for review. This triggers any number of rules that automatically process the changed content (e.g., creating thumbnails, extracting metadata, converting images to a consistent type, creating PDFs from office documents).
  • Content owners and reviewers can use Alfresco’s “custom views” to preview the content chunk in the context of the front-end site.
  • Site designers lay out site pages and create components using the Django template system, CSS, JQuery, and other front-end libraries.
  • Content publishers use the Django administration UI to map areas on the site to categories, folders, and objects in the Alfresco repository–Alfresco has no idea where or how the chunks are being used. This means the repository tier is truly decoupled from the presentation tier, allowing the client to reuse content across multiple areas of the site and across multiple sites within the enterprise.
  • Designers leverage a Django tag library to create dynamic areas of a page (e.g., when the page is rendered, retrieve all of the content chunks in this particular category from the repository). Django calls Alfresco web scripts to get and post data. The web scripts respond with serialized Django XML which Django caches and then deserializes into Django objects that the front-end can work with.

Separate concerns, play to strengths

The thing to notice about the Alfresco piece is how it sticks to core Alfresco capabilities: Metadata, rules, search, basic workflows, transformers/extractors, presentation templates, web scripts, DM repository. This is straight out of the Alfresco best practices playbook and aligns the client well with Alfresco product direction. A nice enhancement would be to refactor the Django-Alfresco integration to use CMIS which is something we are considering for the open source version of the integration (Screencast, Code).

Agile intranet, happy team

Since the initial rollout, the client has been able to make changes and roll out new sites quickly and easily thanks to the productivity inherent in the Django framework and the clean separation between the front-end app and the repository. Unexpected benefits the client mentioned were how fast they can add new features to the administrative UI (a core admin UI gets built for you automatically by Django) and the ease with which the development team can stand up a new environment.

The language the client team used to describe their work since the rollout summed it up best. They were using words like “beautiful” and “a real pleasure to work with”. When was the last time you heard those sentiments expressed about a WCM implementation?

Drupal + Alfresco webinar slides available

People want intranets that are fun and easy to use, full of compelling content relevant to their job, and enabled with social and community features to help them discover connections with other teams, projects, and colleagues. IT wants something that’s lightweight and flexible enough to respond to the needs of the business that won’t cost a fortune.

That’s why Drupal + Alfresco is a great combination for things like intranets like the one Optaros built for Activision and why we had a record-breaking turnout for the Drupal + Alfresco webinar Chris Fuller and I did today. Thanks to everyone who came and asked good questions. I’ve posted the slides. Alfresco recorded the webinar so they’ll make it available soon, I’m sure. When that happens, I’ll update the post with a link. Until then, enjoy the slides.

[UPDATE: Fixed the slideshare link (thanks, David!) and added the links to the webinar recording below]

1. Streaming recording link:
https://alfresco.webex.com/alfresco/lsr.php?AT=pb&SP=TC&rID=42774837&act=pb&rKey=b44130d69cc9ec5f

2. Download recording link:
https://alfresco.webex.com/alfresco/ldr.php?AT=dw&SP=TC&rID=42774837&act=pf&rKey=c50049ac82e1220a