Improving Alfresco Share performance by using getChildren

I had a client that was seeing response time in the neighborhood of several seconds for the Alfresco Share document library and data list pages across all of his sites. The client’s Share install had just over 1,000 Share sites. The volume of the data lists in each site was insignificant. This is the story of how we resolved the issue, but note that the resolution may not be appropriate for everyone in all cases.

The Symptoms

The client was seeing slow performance of the document library and data list pages in Share. They noticed that the folder tree in the document library view responded quickly but the actual document list itself took a long time to render. This was happening for all users in all sites.

Looking for a quick resolution, even if that meant solving the symptom but not necessarily the underlying cause, we decided to see if we could optimize the repository tier web script that returns the document library contents to see if we could get it to perform a little closer to what we were seeing with the folder tree. I made a copy of the repository tier’s doclist.get web script into our project’s extension directory and started tweaking.

The Bunny Trail

First, I’ll fess up to a mistaken assumption I had: I thought that everything in Alfresco always went through Lucene. I knew that separating out full-text index searches and property searches into Lucene queries and DB queries, respectively, was on the roadmap, but I had it in my head that in 3.4, even a call to something like ScriptNode.getChildren() was ultimately a Lucene index hit. If everything is a Lucene hit, I figured, there had to be a different reason for the folder list control to perform so much better than the document library list.

So, instead of starting with what, in hindsight, would have yielded the most bang for the buck, I started tuning what turned out to be little things. For example, our app didn’t use favorites, so I removed any references to the preferences service. Our app didn’t allow users to check out documents so out went any logic that dealt with that. Goodbye, Google Docs code blocks. Adios, type and aspect checking for types and aspects our app doesn’t support. Farewell, filters. I hardcoded permissions to avoid the lookup. I set created by and modified by values to empty strings to avoid the lookup to the person object. I jettisoned anything that wasn’t crucial to simply producing a list of the folder contents. All of this did speed up the repository-tier web script, but only a little bit. I needed an order of magnitude improvement.

The Lightbulb

Next, I did what I should have done initially: Add some simple log statements to see which part of the code was taking the longest to execute. Of course, it was the query. As it turns out, it is much, much faster to ask a node for its children (which is what the treenode web script does) than it is to do a Lucene search with a PARENT clause that yields the same result set (which is what the doclist web script does). On a dev machine with a small dataset, you don’t notice the difference. But on our integration and prod servers the difference is huge.

The Fix

The Share document library page uses a YUI data table to produce the list of documents for the currently selected folder. The data table is bound to a web script that lives on the repository tier that is responsible for returning the requested data as JSON. Out-of-the-box, the repository tier web script that returns the document list calls a function called getFilterParams which is responsible for setting up a bunch of query predicates based on the document library filter the user has selected in the Share UI. The script then asks the filterParams object for the Lucene query it needs to run to return the document list. It then uses the search service to invoke the query and return the results.

My optimization was to bypass building and executing the query completely because, in our case, we don’t care about filters. All we want is the list of children in the current folder, and ScriptNode already has a function to do that called getChildren. So instead of performing a Lucene search, we ask the current “root node” for its children. We then iterate over the results and filter out a couple of content types that otherwise would have been excluded had we used the Lucene query instead of getting all children.

Oh man, that did it. The document library went from rendering in 6+ seconds to rendering in less than 1 second.

I gave the data lists web script the same treatment. In that case, our customized Share app still makes use of filters, so the “getChildren bypass” is only used when the “All” filter is selected. When any other filter is selected the original out-of-the-box Lucene query is used.

Now, again, I completely acknowledge that we may have succeeded in speeding up performance for those two cases, but failed to resolve the underlying issue, and addressing that may result in a system-wide performance boost, but it was good to get the quick fix in place and it should be easy enough to revert if and when we resolve the underlying index issue, if one exists.

Here’s a code snippet from the custom doclist.get.js controller if you are curious:


if (parsedArgs.path == "")
{
    parentNode = parsedArgs.rootNode;
}
else
{
    parentNode = parsedArgs.rootNode.childByNamePath(parsedArgs.path);
}      
// We are iterating over the parent node's children instead of iterating
// over search results... 
for each (node in parentNode.getChildren())
{
   try
   {
      // ...so we need to filter out some system types that would have otherwise been
      // filtered out by the lucene query
      if (node.typeShort == "cm:systemfolder" || node.typeShort == "cm:thumbnail")
      {
         // do nothing. we don't want these.
      }
      else if (node.isContainer || node.typeShort == "app:folderlink")
      {
         folderNodes.push(node);
      }
      else
      {
         documentNodes.push(node);
      }
   }
   catch (e)
   {
      // Possibly an old indexed node - ignore it
   }
}
This entry was posted in Alfresco and tagged , , , . Bookmark the permalink.