A common way to get files and metadata into and out of the Alfresco repository is to use ACP (Alfresco Content Package) files. People that work with ACP files quickly find out that the out-of-the-box ACP import action will only import a given object once–it won’t update an object if it is already in the repository. By default, the import action tries to create a new object on every import. If a like-named object already exists the import will fail.
There’s a simple fix for this. The underlying API actually supports updating objects by matching on UUID. All you need to do is create your own version of the import action that uses the UUID_BINDING rather than the default.
Here are the steps:
- Find the org/alfresco/repo/action/executer/ImporterActionExecuter.java class in the source.
- Make a copy of the class. I called mine com.optaros.action.executer.UpdatingImportExecuter.java.
- Find the line of code that reads: this.importerService.importView(importHandler, new Location(importDest), null, null);
- Change it to: this.importerService.importView(importHandler, new Location(importDest), REPLACE_BINDING, null);
- Now add the definition for REPLACE_BINDING which is a private static inner class:
private static ImporterBinding REPLACE_BINDING = new ImporterBinding() { public UUID_BINDING getUUIDBinding() { return UUID_BINDING.UPDATE_EXISTING; } public String getValue(String key) { return null; } public boolean allowReferenceWithinTransaction() { return false; } public QName[] getExcludedClasses() { return null; } };
As with any action, the last step is to add the Spring configuration:
<bean id="updating-import" class="com.optaros.action.executer.UpdatingImportExecuter" parent="action-executer"> <property name="importerService"> <ref bean="ImporterService"/> </property> <property name="nodeService"> <ref bean="NodeService"></ref> </property> <property name="contentService"> <ref bean="ContentService" /> </property> <property name="mimetypeService"> <ref bean="mimetypeService"/> </property> <property name="fileFolderService"> <ref bean="FileFolderService"/> </property> </bean>
After deploying your changes and restarting the application server you can test the new action. In my case, I wrote a workflow that used JavaScript to execute the export action to export the documents in the workflow. I then simulated an external system operating on the ACP file by writing a quick Perl script to unzip the ACP, inject some metadata into the ACP’s XML manifest, and then zip it back up. At this point the ACP file contains the exported objects (with their associated UUID) and some new metadata. I then trigger the next step in the workflow which, again, uses JavaScript to execute the newly-written Updating Import Action. Unlike the OOTB import, when this one runs Alfresco finds the existing objects based on the incoming UUID’s and instead of trying to create new objects, it updates the old objects with the new metadata in the ACP file. Problem solved.
You can learn more about writing custom Actions on the Alfresco Wiki and in Chapter 4 of the Alfresco Developer Guide available at Packt Publishing or your favorite online book seller.
You can also use the command line importer tool. (http://wiki.alfresco.com/wiki/Export_and_Import#Import_Tool) This allows you to select the UUID binding behaviour you want with the -uuidBinding option.
Jeff, Thanks – a very useful tip! Also seems like something that should by in OOTB in the future. If you decide to open a JIRA request, please post it here as I suspect it would get plenty of votes.
Jonas,
Doesn’t the command line import tool run the repository in its own process? In other words, you have to shut down your repo, then run the import, then restart. Let me know if that is not accurate.
Jeff
Jeff,
That’s correct. You need to shutdown Alfresco to use the command line tools.
Hello,
Is there a way to import a pdf and for example a self made XML with metadata automatically? i am thinking of putting these file in a folder for alfresco to pick up and import.
Thanks!
Johan
Johan,
The easiest way to do this would be to hit the Alfresco repository via CIFS. That way, you could just mount the repo as a file share and copy your file as if you were copying it to any other file share. Then, it’s in. No import necessary. If CIFS isn’t your thing you could use FTP, WebDAV, SMTP, or IMAP (which is a recent addition).
If a “push” doesn’t meet your needs and you want Alfresco to “pull” from somewhere else, you could implement that with a scheduled action or a workflow.
Hope that helps,
Jeff