Category: General

General thoughts that defy categorization.

Have you tried the serverless framework?

Last year I was working on a POC. The target stack of the POC was to be 100% native AWS as much as possible. That’s when I came across Serverless. Back then it was still in beta, but I was really happy with it. After the POC was over I moved on to other things. A couple of days ago I was reminded how useful the framework is, so I thought I’d share some of those thoughts here.

Before I continue, a few words about the term, “serverless”. In short, it gets some folks riled up. I don’t want to debate whether or not it’s a useful term. What I like about the concept is that, as a developer, I can focus on my implementation details without worrying as much about the infrastructure the code is running on. In a “serverless” setup, my implementation is broken down into discrete functions that get instantiated and executed when invoked. Of course, there are servers somewhere, but I don’t have to give them a moment’s thought (nor do I have to pay to keep them running, at least not directly).

If your infrastructure provider of choice is AWS, functions run as part of a service offering called Lambda. If you want to expose those functions as RESTful endpoints, you can use the AWS API Gateway. Of course your Lambda functions can make calls to other AWS services such as Dynamo DB, S3, Simple Queue Service, and so on. For my POC, I leveraged all of those. And that’s where the serverless framework really comes in handy.

Anyone that has done anything with AWS knows it can often take a lot of clicks to get everything set up right. The serverless framework makes that easier by allowing me to declare my service, the functions that make up that service, and the resources those functions leverage, all in an easy-to-edit YAML file. Once you get that configuration done, you just tell serverless to deploy it, and it takes care of the rest.

Let’s say you want to create a simple service that returns some JSON. Serverless supports multiple languages including JavaScript, Python, and Java, but for now I’ll do a JavaScript example.

First, I’ll bootstrap the project:

serverless create --template aws-nodejs --path echo-service

The serverless framework creates a serverless.yml file and a sample function in handler.js that echoes back a lot of information about the request. It’s ready to deploy as-is. So, to test it out, I’ll deploy it with:

serverless deploy -v

Behind the scenes, the framework creates a cloud formation template and makes the AWS calls necessary to set everything up on the AWS side. This requires your AWS credentials to be configured, but that’s a one-time thing.

When the serverless framework is done deploying the service and its functions, I can invoke the sample function with:

serverless invoke -f hello -l

Which returns:

{
    "statusCode": 200,
    "body": "{\"message\":\"Go Serverless v1.0! Your function executed successfully!\",\"input\":{}}"
}

To invoke that function via a RESTful endpoint, I’ll edit serverless.yml file and add an HTTP event handler, like this:

functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: get

And then re-deploy:

serverless deploy -v

Now the function can be hit via curl:

curl https://someid999.execute-api.us-east-1.amazonaws.com/dev/hello

In this case, I showed an HTTP event triggering the function, but you can use other events to trigger functions, like when someone uploads something to S3, posts something to an SNS topic, or on a schedule. See the docs for a complete list.

To add additional functions, just edit handler.js and add a new function, then edit serverless.yml to update the list of functions.

Lambda functions cost nothing unless they are executed. AWS offers a generous free tier. Beyond the first million requests in a month it costs $0.20 per million requests (pricing).

I should also mention that if AWS is not your preferred provider, serverless also works with Azure, IBM, and Google.

Regardless of where you want to run it, if you’ve got 15 minutes you should definitely take a look at Serverless.

Tutorials updated for Alfresco SDK 3.0.0

A week or so ago Alfresco released version 3.0.0 of their Maven-based SDK. The release was pretty significant. There are a variety of places you can learn about the new SDK release:

I recently revised all of the Alfresco Developer Series tutorials to be up-to-date with SDK 3.

Upgrading each project was mostly smooth. The biggest change with SDK 3.0.0 is that the folder structure has changed slightly. At a high level, stuff that used to go in src/main/amp now goes in src/main/resources. These changes are good–they make the project structure look like any other Maven-based project which helps other developers (and tools) understand what’s going on.

I did have some problems getting the repo to build with an out-of-the-box project which I fixed by using the SNAPSHOT rather than the released build.

Support for integration tests seems to have changed as well. I actually still need to work that one out.

Other than that, it was mostly re-organizing the existing source and updating the pom.xml.

A small complaint is that the number of sample files provided with the SDK has grown significantly, which means the stuff you have to delete every time you create a new project has increased.

I understand that those are there for people just getting started, but no one, not even beginners, needs those files more than once, so I’d rather not see them in the main archetype. Maybe Alfresco should have a separate archetype called “sample project” that would include those, and configure the normal archetype as a completely empty project. Just a thought.

The new SDK is supposed to work with any release back to 4.2 so you should be able to upgrade to the new version for all of your projects.

This is How I Work

On any given day I might be writing code, designing the architecture for a content-centric solution, meeting with clients, collaborating with teammates, installing software on servers, writing blog posts, answering questions in forums, writing and signing contracts, or doing bookkeeping. Some days I do all of that. My day is probably similar to that of anyone else doing professional services work as a small business. Here are some of the tools I use to keep all of that going smoothly.

Hardware & Operating System

In 2006 I left Windows behind and never looked back. I ran Ubuntu as my primary desktop for three years, then switched to Mac OS X. I still love Linux, and all of my customers run Linux servers, but for my primary machine, I am most productive with my MacBook Pro and OS X.

I’ll admit that the latest MacBooks shook my faith by incorporating a shiny feature I’ll never use and not upgrading the CPU or RAM. I briefly considered moving back to Linux on something like a System76 or a Lenovo, but I am so deep into the Apple ecosystem in both home and office it may not be practical to switch. I’m hopeful the new MBP’s will get beefed up in terms of RAM and CPU later this year.

Collaborating with teammates and clients

I’ve been using Trello for project and task tracking for a long time and it really agrees with me. I set up Trello teams with some of my clients and it helps keep us all on track.

For real-time collaboration I tend to use Slack. It doesn’t always make sense to do so, but for selected projects, I invite my clients to my corporate Slack team and we use private channels to work on projects. We use Slack’s integrations to get notified when changes happen in Trello or in our codebase which resides in either Git or Bitbucket.

For real-time collaboration via IRC, Jabber, and GChat I use Adium.

Creating content & code

For plain text editing, I use either Aquamacs or Atom depending on what I’m doing. If it’s just a free-form text file, like maybe a blog post or just some rough meeting notes or something like that I’ll use Aquamacs, which is a Mac-specific distribution of Emacs. If I am doing non-compiled coding in something like JSON, XML, Python, Groovy, or JavaScript I’ll typically use Atom.

For more intense JavaScript projects I will often switch to WebStorm and sometimes for Python I’ll use PyCharm instead of Atom.

For Java projects I’ve recently moved from Eclipse to IntelliJ IDEA. I’ve used Eclipse for many, many years, but IntelliJ feels more reliable and polished. I’ve been pretty happy with it so far.

IntelliJ, WebStorm, and PyCharm are all available from JetBrains and the company offers an all-in-one subscription that is worth considering.

I do a lot with markdown. For example, if I’m taking notes on a customer’s installation and those notes will be shared with the customer, rather than just doing those notes in plain text with no structure, I’ll use markdown. Then, to preview the document and render it in PDF I use Marked. This is also handy for previewing Github readme files, but Atom can also be used for that.

Another time saver is using markdown to produce presentations. I wouldn’t necessarily use it for marketing-ready pitches, but for pulling together a quick deck to review thoughts with a customer or presenting a topic at a meetup, markdown is incredibly fast. To actually render and display the presentation from markdown I use Deckset. It produces beautiful presentations with very little effort while maintaining the editing speed that plain text markdown provides.

Sometimes I’ll create a video to illustrate a concept, either to help the community understand a feature or extension technique or to demo some new functionality to a customer when schedules won’t align. Telestream is a wonderful tool for creating such screencasts.

My open source projects live at Github while my closed source projects live at Bitbucket. The primary draw to Bitbucket is the free private repositories, but I really like what Atlassian offers. When git on the command-line just won’t do, I switch to Sourcetree, Atlassian’s visual git client.

Automation, Social & News Feeds

I have a client where I have to manage over 60 servers across several clusters. To automate the provisioning of new nodes, upgrades, and configuration management, I use Ansible. It’s much easier to learn than Chef and it requires no agents to be installed on the servers being managed. Plus, it’s Python-based.

Lots of servers, lots of customers, and lots of business and personal accounts means password management can become an issue. I use KeePassX on my desktop and iKeePass on my iOS devices to keep all of my credentials organized.

I have a ton of different news sources I try to keep up with. Feedly helps a lot. I use it on my mobile devices and on the web.

I have a personal Twitter account and a corporate Twitter account, plus I help out with other accounts from time-to-time. Luckily, Hootsuite makes that easy, and it handles more than just Twitter.

Back-office

When you run your own business there are two things that can be a time suck without the right tools: contracts/forms and bookkeeping. PDF Expert lets me fill in PDF forms and sign contracts and other documents right on my mobile device. It has direct integrations with Google Drive, Dropbox, and other file share services so it is easy to store signed documents wherever I need to.

I used QuickBooks Desktop Professional Services Edition to handle my bookkeeping for several years. But that edition was only for Windows, so I ran it in a Windows VM on my Mac with VMWare, which was kind of a drag. I finally got tired of that and migrated to QuickBooks Online. The migration was completely painless. I just called them up, and within about an hour they had taken my money and moved all of my data without anything getting screwed up. It was a pretty awesome experience.

Physical Office

I’ve been working from home for over ten years, so I appreciate the importance physical space plays in a productive working environment. My desk is a custom-built UPLIFT standing desk with a solid cherry top. I love it and haven’t had a problem with it. I do need to make myself stand up more often, though.

I found that even with the desk raised completely, my cinema display wasn’t quite high enough. So I snagged a Humanscale M8 articulating mount with an Apple VESA adapter. Now I just grab my display and put it exactly where it needs to be.

My Mac hooks up to my display via Thunderbolt, which makes connecting and disconnecting a breeze. But I didn’t like how much real estate the laptop took up on my desk. There are a variety of solutions for this. I went with a Twelve South BookArc vertical desktop stand and that’s worked really well. I have a minor concern about whether or not using a MacBook in a vertical position is bad with regard to heat dissipation but I’ve decided to roll the dice on that.

I love my home office setup, but I do get tired of the same four walls sometimes. To combat that and to just change things up a bit, every week I try to spend some time in a co-working space. Here in my hometown there’s one right on the square in the historic part of downtown that has got a good vibe and is close to good food. You might check Sharedesk to see if there’s something similar near you.

So there you have it. Those are some of the tools I use every day. Got any favorites you’d like to share?

The plain truth about Alfresco’s open source ethos

There was a small flare-up on the Order of the Bee list this week. It started when someone suggested that the Community Edition (CE) versus Enterprise Edition comparison page on alfresco.com put CE in a negative light. In full disclosure, I collaborated with Marketing on that page when I worked for Alfresco. My goal at the time was to make sure that the comparison was fair and that it didn’t disparage Community Edition. I think it still passes that test and is similar to the comparison pages of other commercial open source companies.

My response to the original post to the list was that people shouldn’t bother trying to get the page changed. Why? Because how Alfresco Software, Inc. chooses to market their software is out-of-scope for the community. As long as the commercial company behind Alfresco doesn’t say anything untrue about Community Edition, the community shouldn’t care.

The fact that there is a commercial company behind Alfresco, that they are in the business of selling Enterprise support subscriptions, and at the same time have a vested interest in promoting the use of Community Edition to certain market segments is something you have to get your head around.

Actually there are a handful of things that you really need to understand and accept so you can be a happy member of the community. Here they are:

1. CE is distributed under LGPLv3 so it is open source.

If you need to put a label on it and you are a binary type of person, this is at the top of the list. Alfresco is “open source” because it is distributed under an OSI-approved license. A more fine-grained description is that it is “open core” because the same software is distributed under two different licenses, with the enterprise version being based on the free version and including features not available in the free version.

2. Committers will only ever be employees.

There have been various efforts over the years to get the community more involved in making direct code contributions. The most recent is that Aikau is on github and accepting pull requests. Maybe some day the core repository will be donated to Apache or some other foundation. Until then, if you want to commit directly to core, send a resume to Alfresco Software, Inc. I know they are hiring talented engineers.

3. Alfresco Software, Inc. is a commercial, for-profit business.

Already mentioned, but worth repeating: The company behind the software earns revenue from support subscriptions, and, increasingly, value-added features not available in the open source distribution. The company is going to do everything it can to maximize revenue. The community needs this to be the case because a portion of those resources support the community product. The company needs the community, so it won’t do anything to aggressively undermine adoption of the free product. You have to believe this to be true. A certain amount of trust is required for a symbiotic relationship to work.

4. “Open source” is not a guiding principle for the company.

Individuals within the company are ardent open source advocates and passionate and valued community members, but the organization as a whole does not use “open source” as a fundamental guiding principle. This should not be surprising when you consider that:

  1. “Drive Open Innovation” not “Open Source” is a core value to the company as publicly expressed on the Our Values page.
  2. The leadership team has no open source experience (except John Newton and PHH whose open source experience is Alfresco and Activiti).
  3. The community team doesn’t exist any more–the company has shifted to a “developer engagement” strategy rather than having a dedicated community leadership or advocacy team.

Accept the fact that this is a software company like any other, that distributes some of its software under an open source license and employs many talented people who spend a lot of their time (on- and off-hours) to further the efforts of the community. It is not a “everything-we-do-we-do-because-open-source” kind of company. It just isn’t.

5. Alfresco originally released under an open source license primarily as a go-to-market strategy

In the early days, open source was attractive to the company not because it wanted help building the software, but because the license undermined the position of proprietary vendors and because they hoped to gain market share quickly by leveraging the viral nature of freely-distributable software. Being open was an attractive (and highly marketable) contrast to the extremely closed and proprietary nature of legacy ECM vendors such as EMC and Microsoft.

I think John and Paul also hoped that the open and transparent nature of open source would lend itself to developer adoption, third-party integrations and add-ons, and a partner ecosystem, which it did.

I think it is this last one–the mismatch between the original motivations to release as open source and what we as a community expect from an open source project–that causes angst. The “open source” moniker attracts people who wish the project was more like an organic open source project than it can or ever will be.

For me, personally, I accepted these as givens a long time ago–none of them bother me any more. I am taking this gift that we’ve been given–a highly-functional, freely-distributable ECM platform–and I’m using it to help people. I’m no longer interested in holding the company to a dogmatic standard they never intended to be held to.

So be cool and do your thing

The “commercial” part of “commercial open source” creates a tension that is felt both internally and externally. Internal tension happens when decisions have to be made for the benefit of one side at the expense of the other. External tension happens when the community feels like the company isn’t always acting in their best interest and lacks the context or visibility needed to believe otherwise.

This tension is a natural by-product of the commercial open source model. It will always be there. Let’s acknowledge it, but I see no reason to antagonize it.

If you want to help the community around Alfresco, participate. Build something. Install the software and help others get it up and running. Join the Order of the Bee. If you want to help Alfresco with its marketing, send them your resume.

Quick Hack: Creating default folder structures in Alfresco Share sites

Someone recently asked how to create Alfresco Share sites with a default folder structure. Currently, out-of-the-box, when you create an Alfresco Share site, the document library is empty. This person instead wanted to define a set of folders that would be created in the document library when a new Alfresco Share site is created.

Although this exact functionality is not available out-of-the-box, there is something similar. It’s called a “space template”. Space templates have been in the product since the early days but haven’t yet been exposed to Alfresco Share. In the old Alfrexsco Explorer client you could specify a space template when you created a new folder, and the resulting folder would have the same set of folders and documents that were present in the template.

With a little bit of work using the out-of-the-box extension points we can get Alfresco to use space templates when creating new sites in Share. I’ve created this as an add-on and the code lives on GitHub. I thought it might be instructive to review how it works here.

Approach

The first thing to realize is that a space template isn’t special. It’s just a folder that happens to live in Data Dictionary/Space Templates. In fact, there aren’t any API calls specific to space templates. If you go looking for a createFolderFromTemplate() method on a Folder object you’ll be disappointed. When the Alfresco Explorer client creates a folder from a template, it simply finds the template folder and copies its contents into the newly-created folder.

On the Alfresco Share side, a site isn’t that special either. It’s just a special type of folder. The document library that sits within a Share site is also just a folder, albeit a specially-named folder with an aspect. Normally, the document library folder does not get created until the first user actually opens the site and navigates to the document library.

So all we really need to do is write some code that gets called when a site is created, looks up the space template, and then creates the document library folder with the contents of the space template.

What’s the best way for the code to know which space template to use? One way would be to use a specially-named space template for all Share sites. But using a single space template for all Share sites seems limiting. Alfresco Share already has a mechanism for selecting the “type” of site to create–it’s called a preset. So a better approach is to use the preset’s ID to determine which space template to use.

Code

We need to run some code when a site is created. One way to do this is with a behavior. The behavior is Java code that will be bound to the onCreateNode policy for nodes that are instances of st:site. The init() method does that:

// Create behaviors
this.onCreateNode = new JavaBehaviour(this, "onCreateNode", NotificationFrequency.TRANSACTION_COMMIT);

// Bind behaviors to node policies
this.policyComponent.bindClassBehaviour(QName.createQName(NamespaceService.ALFRESCO_URI, "onCreateNode"), TYPE_SITE, this.onCreateNode);

So any time a node is create that is an instance of st:site, the onCreateNode() method in this class will get called.

The first thing the onCreateNode() method needs to do is find the space template. To keep things simple, I’m going to assume that the space template is named the same thing as the site preset ID, so all I need to do is grab that preset ID and do a Lucene search to find the template:

NodeRef siteFolder = childAssocRef.getChildRef();

if (!nodeService.exists(siteFolder)) {
logger.debug("Site folder doesn't exist yet");
return;
}

//grab the site preset value
String sitePreset = (String) nodeService.getProperty(siteFolder, PROP_SITE_PRESET);

//see if there is a folder in the Space Templates folder of the same name
String query = "+PATH:\"/app:company_home/app:dictionary/app:space_templates/*\" +@cm\\:name:\"" + sitePreset + "\"";
ResultSet rs = searchService.query(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, SearchService.LANGUAGE_LUCENE, query);

If there aren’t any space templates the method can return, otherwise, the space template should be copied into the site folder as the new document library folder:

documentLibrary = fileFolderService.copy(spaceTemplate, siteFolder, "documentLibrary").getNodeRef();

//add the site container aspect, set the descriptions, set the component ID
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(ContentModel.PROP_DESCRIPTION, "Document Library");
props.put(PROP_SITE_COMPONENT_ID, "documentLibrary");
nodeService.addAspect(documentLibrary, ASPECT_SITE_CONTAINER, props);

I used the fileFolderService to perform the copy, then I set the properties and aspect on the new document library folder with the values that Alfresco Share expects a document library to have.

I’ve left out some insignificant code bits here and there–you can look at the source if you need to. The project was bootstrapped using the Alfresco Maven SDK and includes a unit test that makes sure the behavior works as expected.

Result

The project creates an AMP file. If you’ve checked out the source you can build the AMP using mvn install. Then you can install the AMP into the Alfresco WAR by placing the AMP in the amps directory and run apply_amps.sh. Alternatively, you can use mvn alfresco:install. After installing the AMP into the Alfresco WAR you can test it out.

Out-of-the-box Alfresco Share has a single site type called “Collaboration Site”. The preset ID for that type of site is “site-dashboard”. You might have additional site types configured in your installation. To set up the template for the default collaboration site, navigate to Data Dictionary/Space Templates and create a folder called “site-dashboard”. Then add whatever folders and documents you want to that folder. Now, anytime a “Collaboration Site” gets created in Alresco Share, the document library will automatically be created with the same folders and documents you’ve set up in your space template.

This was not a mind-blowing, deeply-technical extension to Alfresco. Hopefully, it shows you that, with even a few lines of code, you can easily add useful functionality to Alfresco.

Alfresco 4.2.d: The Public API & some CMIS changes you should be aware of

Well, Alfresco Community Edition 4.2.d has been out for a couple of weeks now and everyone I’ve talked to seems pretty excited about the new release. The new ribbon header in the Share user interface, the filmstrip view, and the table view for the document library seems to have gotten all of the attention (and it does look sharp) but my favorite new feature is actually behind-the-scenes: The Alfresco Public API has finally made its way from Alfresco in the cloud to on-premise.

What is the Alfresco Public API?

We’ve talked about it a lot over the last year, but it’s always been cloud-only, so you’re forgiven if you need a refresher. Basically, it means that we now have a documented, versioned, and backward compatible API that you can use to do things like:

  • Get a list of sites a user can see
  • Get some information about a site, including its members
  • Add, update, or remove someone from a site
  • Get information about a person, including their favorite sites and preferences
  • Get the tags for a node, including updating a tag on a node
  • Create, update, and delete comments on a node
  • Like/unlike a node
  • Do everything that CMIS can do

The Public API is comprised of CMIS for doing things like performing CRUD functions on documents and folders, querying for content, modifying ACLs, and creating relationships plus non-CMIS calls for things that CMIS doesn’t cover. When you run against Alfresco in the cloud you use OAuth2 for authentication. When you run against Alfresco on-premise, you use basic authentication.

Some Examples

I created some Java API examples for cloud a while ago. This week I spent some time on those examples to get them cleaned up a bit here and there and to have a really clear separation between what’s specific to running against Alfresco on-premise versus Alfresco in the cloud and what’s common to both. The result is a single set of code that runs against either one.

Here’s how the project is set up:

The remaining classes in com.alfresco.api.example are the actual examples. These would probably be better structured as unit tests but for now, they are just runnable Java classes. Currently there are only four:

  • GetSitesExample.java simply fetches the short names of up to 10 sites the current user has access to.
  • CmisRepositoryInfoExample.java is the simplest CMIS example you can write. It retrieves some basic information about the repository.
  • CmisCreateDocumentExample.java creates a folder, creates a document, likes the folder, and comments on the document. This is a mix of CMIS and non-CMIS calls.
  • CmisAspectExample.java imports some images, uses Apache Tika to extract some metadata from the binary image file, then uses the OpenCMIS Extension to store those metadata values in properties defined in aspects. The extension has to be used because CMIS 1.0 doesn’t support aspects.

Take a look at the readme.md file to see what changes you need to make to run these examples in your own environment.

New CMIS URLs in Alfresco Community Edition 4.2.d

One thing to notice is that starting with this release there are two new CMIS URLs. Which one you use depends on which version of CMIS you would like to leverage. If you want to use CMIS 1.0, the URL is:
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.0/atom

If you would rather use CMIS 1.1, the URL is:
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom

New with 4.2.d is the CMIS 1.1 browser binding. Currently Alfresco in the cloud only supports Atom Pub, but if you are only running against on-premise, you might decide that the browser binding, which uses JSON instead XML, is more performant. If you want to use the browser binding, the service URL is:

http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser

CMIS Repository ID and Object ID Changes

When using the new URLs, you’ll notice a couple of things right away. First, the repositoryId is coming back as “-default-” instead of the 32-byte string it used to return.

Second, the format of the object IDs has changed. The CMIS specification dictates that object IDs are supposed to be opaque. You shouldn’t build any logic on the content of a CMIS object ID. However, if you have built a client application that stores object IDs, you may want to pay particular attention to this when you test against the new version. The object IDs for the same object will be different than they were when using the old CMIS URLs.

You can see this in 4.2.d. If I use the old “/alfresco/cmisatom” URL, and I ask for the ID of the root folder, I’ll get:
workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc

But if I use either the new CMIS 1.0 URL or the new CMIS 1.1 URL and ask for the ID of the same root folder, I’ll get:
ae045ec3-5578-4c7b-b5a6-f893810e63dc

In order to minimize the impact of this change, you can issue a getObject call using either the old ID or the new ID and you’ll get the same object back. And if you ask for the object’s ID you’ll get it back in the same format that was used to retrieve it, as shown in this Python example:

>>> repo.getObject('ae045ec3-5578-4c7b-b5a6-f893810e63dc').id
'ae045ec3-5578-4c7b-b5a6-f893810e63dc'
>>> repo.getObject('ae045ec3-5578-4c7b-b5a6-f893810e63dc').name
u'Company Home'
>>> repo.getObject('workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc').id
'workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc'
>>> repo.getObject('workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc').name
u'Company Home'

Still Missing Some CMIS 1.1 Goodness

The browser binding is great but 4.2.d is still missing some of the CMIS 1.1 goodness. For example, CMIS 1.1 defines a new concept called “secondary types”. In Alfresco land we call those aspects. Unfortunately, 4.2.d does not yet appear to have full support for aspects. You can see what aspects a node has applied by inspecting the cmis:secondaryObjectTypeIds property, which is useful. But in my test, changing the value of that property did not affect the aspects on the object as I would expect if secondary type support was fully functional.

Summary

The Public API is new. There are still a lot of areas not covered by either CMIS or the rest of the Alfresco API. Hopefully those gaps will close over time. If you have an opinion on what should be the next area to be added to the Alfresco API, please comment here or mention it to your account rep.

Don’t get tripped up by MySQL/MariaDB skip-networking default in MacPorts

I recently decided to switch one of my MacBooks from MySQL installed with the binary package to MariaDB installed through MacPorts. I’ve used MacPorts for years and it has worked great for me, although I realize that isn’t the case for everyone.

After installing MariaDB, when I started Alfresco, it couldn’t talk to the database. I thought it might be a MariaDB problem, so I poked around a bit and then installed MySQL from MacPorts. Same problem.

I noticed that the problem was only with JDBC connections. Python didn’t have any trouble connecting nor did any of my non-JDBC database tools. A clue.

Then I noticed that “netstat -an|grep 3306” came back with nothing indicating that the database wasn’t listening on the port at all. I thought maybe it was a networking problem. Maybe permissions. Maybe a my.cnf issue. I tried tweaking all of that but wasn’t making any progress.

I decided there must be a .cnf file somewhere that was hosing me. It turns out that MacPorts installs a default .cnf file for both MySQL and MariaDB that has “skip-networking” turned on. That turns off all network-based connections to the server and that includes JDBC.

I have no idea why that is turned on by default but if you don’t know to look for it, you may chase your tail for a while. The fix is simply to edit the default .cnf file and comment out skip-networking. For MySQL, the file lives here:

/opt/local/etc/mysql55/macports-default.cnf

And for MariaDB the file lives here:

/opt/local/etc/mariadb/macports-default.cnf

Alfresco Roma Meetup Agenda

UPDATE: I’ve updated this post with links to the presentations from the meetup. We had a great turnout. Thanks for coming, everyone!

We did a half-day meetup in Rome on Friday, March 22. The agenda with links to what was presented is as follows:

15:00 to 15:15 Welcome (Jeff)

15:15 to 15:45 Introducing the Alfresco API (Jeff)

15:45 to 16:15 Customer Case Study (Fastweb)

16:15 to 16:45 Alfresco WebScript Connector for Apache ManifoldCF (Piergiorgio Lucidi, Sourcesense)

16:45 to 17:00 BREAK

17:00 to 17:30 Part 1: Alfresco & Semantics, Part 2: RedLink (Ainga Pillai, Zaizi)

17:30 to 18:00 The New Maven SDK (Gab Columbro, Alfresco)

18:00 to 18:15 Invitation to Join the Community (Jeff)

18:15 to 19:00 Pizza, Beer, & Networking

Parenting Hack: Even and Odd Days

I have a sister three years younger than I am. When we were kids we fought over everything. Who gets the front seat. Whose turn is it to take out the garbage. Or to feed the dog. Or to do the dishes. And on and on. As a parent of two kids I now know how crazy that must have made my parents.

Their solution to this was usually along the lines of “If you can’t work it out I’ll work it out for you,” followed by a forced compromise neither of us liked, delivered in exasperation, leaving each of us seething.

Luckily, with my two kids we’ve been using a solution that has worked great for many years, and one that came to us from my sister, ironically enough. We call it “even and odd days”. The way it works is that each child gets a day. When it is that child’s day, they have to do all of the chores, but they also get to reap whatever perks might occur that day. To keep track of whose day it is, one child takes even days and the other takes odd days. In our case, my son’s birthday falls on an odd day and my daughter’s on an even, so making the assignment (and remembering who has even and who has odd) was easy.

For example, the 24th is an even day so it is my daughter’s day. She’s got to feed the dog (both times), empty the dishwasher if it needs it, set the table, and rinse and load the dinner dishes into the dishwasher. But, if there’s a choice about what to have for dinner, she gets to pick. If we’re playing a game, she decides who goes first. So when it is your day you work hard, but you also reap the benefits.

The even and odd days approach takes the emotion and guesswork out of it. Who was the last person to clean the dog poop out of the back yard? Doesn’t matter–today’s an odd day, which means it is your turn, pal!

We started doing this when the kids were probably 5 and 8 and after sticking to it for six years I can say we’ve never had the kind of sibling strife over chores and perks like my sister and I experienced. Every now and then, when one kid’s schedule prevents them from doing a chore on their day and we ask the other one to do it, we’ll get a, “Sorry, not my day” response, particularly if that child has felt like the system hasn’t treated them fairly of late, but that can be dealt with. And when we first started out my son felt a little aggrieved because there are times when odd days occur twice in a row (31st, followed by 1st), but other than those blips, we’re happy with it.

The system even works for chores that are not daily tasks. For example, in our neighborhood, the trash and recycle bins have to be put out on the curb Sunday night. If a Sunday night falls on an even day, it’s my daughter’s task and when it falls on an odd it’s my son’s. Works like a charm.

Tips on Working with Google Fusion Tables

We had a need to see Alfresco forum users by geography. Google Fusion Tables provides the capability to see any geographic location stored in one or more columns on a map. We had successfully used this before for smaller batches of mostly static data, so I decided to see if it would work well for our forum data. This blog post is about what I did, including some useful tips for working with the Google Fusion Table API.

Determining the Location

First, I needed a city and country for each forum user. In our forums, users can declare their location, but not everyone does. So I wrote a little Python script that uses the MaxMind GeoLite database to determine a location for each user based on IP address. The script then compares the IP-determined location with the user’s declared location, and if they are different, it asks the person running the script to choose which one is likely to be more accurate. For example, the IP address based lookup might come back with “Suriname” but the user’s declared location is “Paramaribo, Suriname”, so you’d choose the latter. The script saves each decision so that it doesn’t have to ask again for the same comparison on this run or subsequent runs.

Loading the Data into Google Fusion Tables with Python

Once I had a city and country for each forum user I had to get those loaded into a Google Fusion Table. I found this Python-based Fusion Tables client and it worked quite nicely.

Here are a few tips that might save you some time when you are working with Google Fusion Tables, regardless of the client-side language…

Don’t Update–Drop, then Add

I started by trying to be smart about updating existing records rather than inserting new ones. But this meant that for each row, I had to do a query to test for the existence of a match and then do an update. This was incredibly slow, especially because you can’t do bulk updates (see next point).

So every time I run an update, the script first clears out the table. That means I load the entire dataset every time there is an update, but that is much faster than the update-if-present-otherwise-insert approach.

Batch Your Queries

The Google Fusion Tables API supports bulk operations. You can execute up to 500 at-a-time, if I recall correctly. This is a huge time-saver. My script just adds the insert statements to a list, and when it gets 500 (or runs out of inserts) it joins the list on “;” and then executes the batch with a single call to the Fusion Tables API.

The one drawback, as mentioned in the previous point is that it does not support bulk updates–only inserts are supported. But with the performance gain of bulk operations, I don’t mind clearing out the table and re-inserting.

Throttle Your Requests

If the script exceeds 30 requests per minute it is highly likely you will get rate-limited. So it is important to throttle your requests. I found that a 2.5 second wait between queries was fine and because the queries are batched 500 at-a-time, it really isn’t a big deal to wait.

Geocoding Takes Time

So the whole thing is pretty slick but there is a small pain. Because all rows get dropped every time I load the table, every row has to be geocoded and that takes time. I believe there is an API call to ask the table to be geocoded but I haven’t found that to work reliably. Instead, I have to go to the table in my browser and tell Fusion Tables to geocode the table. This takes a LONG time. For a table of about 10,000 rows it could easily take 45 minutes or more. At least it is something I can kick off and let run. I only update the table once a month. If it were more often, it would be an issue.

Voila!

That’s it! Thanks to Python and Google Fusion Tables, I now have an interactive map of forum users. Not only is it useful to use interactively, it also lets me run geographic queries against it from Python, such as, “find me the 20 forum users with more than X posts who work within a 20 mile radius of this spot” which can be handy for doing local community outreach.