Solr-powered Alfresco

Have you checked out the Apache Solr project yet? It’s pretty cool. It’s essentially a search server (deployed as a web app into a servlet container) that sits on top of Lucene. Solr makes it super easy to get content into and out of Lucene via its HTTP and JSON APIs.

Recently, for a prospective Optaros client, we put together a little demo to show how Alfresco WCM could integrate with Solr to provide search and personalization for a web site managed within Alfresco. Here’s what we did at a high level:

  • Create an Alfresco web form and XSLT for my web content as usual.
  • Create an additional XSLT (or Freemarker) template to convert the XML content to the Solr format. This gets configured as an additional presentation template associated with the web form.
  • Wrote a JSP to aggregate the Solr XML for all of the published content.
  • Wrote a servlet to call the JSP every X seconds. It takes the response and posts it to Solr. That’s how the Alfresco content gets into the index.

This setup allowed web content to get indexed by the Solr search engine upon its creation. Web site users (either using the web site in the virtualized sandbox or on the production web site) could then query the content.

The web site was a mix of static HTML and JSPs. The JSPs used custom taglibs to call “Solr Search” widgets in the right spot on the page. This was the first time I had used Alfresco’s virtualization to run a real web application (as opposed to static content). The preview release of 2.0 I was using seemed to have some significant cacheing issues. Hopefully those are resolved in the production release. Other than that, it was easy to see how technical and non-technical content managers could leverage Alfresco virtualization to collaborate together to develop and manage a dynamic web site.

Before using this approach in production, I would need to think about the best way to handle deletes. In the demo, once content got into the index, it didn’t come out if the associated content was removed from Alfresco. As far as Solr goes, it is easy to get the content deleted from the index–it’s a simple HTTP post. The trick is where in Alfresco to put that call.

5 comments

  1. Maruti says:

    excellent article. I want to become a J2EE architect your web site is giving cool stuff for me
    Thanks A LOT 🙂

  2. Jason Duke says:

    Jeff,

    Was googling a bit for “Alfresco and Solr” and this article came up. Have you had any more experience integrating Solr with Alfresco?

    I’m especially interested in understanding whether it is feasible for the existing Alfresco Share search components to leverage Solr, or if that would likely require a fully custom Search UI written from scratch.

    The existing Share Search UI seems pretty bare, so my current assumption is that it will be necessary to add additional Search UI for any advanced search features.

    Any guidance?

    Thanks again for all the great articles.

    -Jason

  3. jpotts says:

    Not me personally but we’ve definitely done Solr integrations with Alfresco for clients.

    You definitely have to write your own search UI in Share to call a different search engine, but because Solr has a REST interface and Share is built on Surf, which is purpose-built for interacting with other HTTP endpoints via REST, the UI component of the integration is not a huge deal. The bigger effort is making sure that you’ve hooked into document CRUD functions on the repository tier so that your Solr index stays up-to-date. Depending on your requirements, you might do that in behaviors, or in rules, or in a workflow.

    As a side note, I’ve heard some murmurs about incorporating Solr into the Alfresco product somehow but I have neither details nor timeframe.

    Jeff

Comments are closed.