Tag: Forums

Tips on Working with Google Fusion Tables

We had a need to see Alfresco forum users by geography. Google Fusion Tables provides the capability to see any geographic location stored in one or more columns on a map. We had successfully used this before for smaller batches of mostly static data, so I decided to see if it would work well for our forum data. This blog post is about what I did, including some useful tips for working with the Google Fusion Table API.

Determining the Location

First, I needed a city and country for each forum user. In our forums, users can declare their location, but not everyone does. So I wrote a little Python script that uses the MaxMind GeoLite database to determine a location for each user based on IP address. The script then compares the IP-determined location with the user’s declared location, and if they are different, it asks the person running the script to choose which one is likely to be more accurate. For example, the IP address based lookup might come back with “Suriname” but the user’s declared location is “Paramaribo, Suriname”, so you’d choose the latter. The script saves each decision so that it doesn’t have to ask again for the same comparison on this run or subsequent runs.

Loading the Data into Google Fusion Tables with Python

Once I had a city and country for each forum user I had to get those loaded into a Google Fusion Table. I found this Python-based Fusion Tables client and it worked quite nicely.

Here are a few tips that might save you some time when you are working with Google Fusion Tables, regardless of the client-side language…

Don’t Update–Drop, then Add

I started by trying to be smart about updating existing records rather than inserting new ones. But this meant that for each row, I had to do a query to test for the existence of a match and then do an update. This was incredibly slow, especially because you can’t do bulk updates (see next point).

So every time I run an update, the script first clears out the table. That means I load the entire dataset every time there is an update, but that is much faster than the update-if-present-otherwise-insert approach.

Batch Your Queries

The Google Fusion Tables API supports bulk operations. You can execute up to 500 at-a-time, if I recall correctly. This is a huge time-saver. My script just adds the insert statements to a list, and when it gets 500 (or runs out of inserts) it joins the list on “;” and then executes the batch with a single call to the Fusion Tables API.

The one drawback, as mentioned in the previous point is that it does not support bulk updates–only inserts are supported. But with the performance gain of bulk operations, I don’t mind clearing out the table and re-inserting.

Throttle Your Requests

If the script exceeds 30 requests per minute it is highly likely you will get rate-limited. So it is important to throttle your requests. I found that a 2.5 second wait between queries was fine and because the queries are batched 500 at-a-time, it really isn’t a big deal to wait.

Geocoding Takes Time

So the whole thing is pretty slick but there is a small pain. Because all rows get dropped every time I load the table, every row has to be geocoded and that takes time. I believe there is an API call to ask the table to be geocoded but I haven’t found that to work reliably. Instead, I have to go to the table in my browser and tell Fusion Tables to geocode the table. This takes a LONG time. For a table of about 10,000 rows it could easily take 45 minutes or more. At least it is something I can kick off and let run. I only update the table once a month. If it were more often, it would be an issue.

Voila!

That’s it! Thanks to Python and Google Fusion Tables, I now have an interactive map of forum users. Not only is it useful to use interactively, it also lets me run geographic queries against it from Python, such as, “find me the 20 forum users with more than X posts who work within a 20 mile radius of this spot” which can be handy for doing local community outreach.

Thoughts on the Alfresco forums

Back in 2009 I wrote a post called, “The Alfresco forums need your help.” It was about how I happened to come across the “unanswered posts” page in the Alfresco forums and noticed, to my horror, that it was 40 pages long. I later realized that the site is configured to show no more than 40 pages so it was likely longer.

Now that I’m on the inside I’ve got access to the data. As it turns out, as of earlier this month, in the English forums we had a little over 1100 topics created over the past year that never got a reply. That represents about 27% of all topics created for that period.

Last year I ran a Community Survey that reported that 55% of people have received responses that were somewhat helpful, exactly what they were looking for, or exceeding expectations. A little over 10% received a response that wasn’t helpful. About 34% said they never saw a response. If you look at the actual numbers for the year leading up to the survey, there were about 1500 topics created that never got a reply, which is again about 28% of all topics created for the same period.

That day in 2009 I suggested we start doing “Forum Fridays” to encourage everyone to spend a little time, once a week, helping out in the forums. I kept it up for a while. The important thing for me was that even if I didn’t check in every Friday, I did form a more regular forum habit. It felt good to see my “points” start to climb (you can see everyone’s points on the member list) and I started to feel guilty when I went too long without checking in.

Since joining Alfresco I’ve been in the forums more regularly. In fact, this month, I decided to make February a month for focusing on forums. I spent a significant amount of time in the forums each day with a goal of making a dent in unanswered posts. I also wanted to see if I could understand why posts go unanswered.

Some topics I came across were unanswered because they were poorly-worded, vague, or otherwise indecipherable. I’d say 5% fit this category. More often were the questions that were either going to require significant time reproducing and debugging or were in highly-specialized or niche areas of the platform that just don’t see a lot of use. I’d say 20% fit this category. These are questions that maybe only a handful of people know the answer to. But at least 50% or maybe more were questions a person with even a year or two of experience could answer in 15 minutes or less.

Alfresco is lucky. Our Engineering team spends significant time in the forums. The top posters of all time–Mike Hatfield, Mark Rogers, Kevin Roast, Gavin Cornwell, Andy Hind, David Caruana, Derek Hulley–are the guys that built the platform. Somehow they manage to do that and consistently put up impressive forum numbers. We also have non-Alfrescans that spend a lot of time in the forums racking up significant points. Users such as zaizi, Loftux, OpenPj, savic.prvoslav, and jpfi, just to name a few, are totally crushing it. It isn’t fair or reasonable for me to ask either of these groups to simply spend more time in the forums. And, while I have sincerely enjoyed Focus on Forums February, I’m not a scalable solution. Instead, I’d like to mobilize the rest of you to help.

I think if we put our minds to it, we should be able to address every unanswered post:

  • Questions that are essentially “bad questions” need a reply with friendly suggestions on how to ask a better question.
  • Time-consuming questions need at least an initial reply that suggests where on docs.alfresco.com, the wiki, other forum posts, or blogs the person might look to learn more, or even a reply that just says, “What you’re asking can’t easily be answered in a reasonable amount of time because…”. People new to our platform don’t know what is a big deal and what isn’t, so let’s explain it.
  • Highly-specialized or niche questions should be assigned to someone for follow-up. If you read a question and your first thought is, “Great question, I have absolutely no idea,” your next thought should be, “Who do I know that would?”. Rather than answering the question your job becomes finding the person that does know the answer. Shoot them a link to the thread via email or twitter or IRC. Some commercial open source companies I’ve spoken to about this topic actually assign unanswered posts to Jira tickets. That’s food for thought.
  • Relatively easy questions have to be answered. Our volume is manageable. We tend to get about 400 new topics each month with 100 remaining unanswered, on average. With a company of our size, with a partner network as big as we have, with as many community members as there are in this world, I see no good reason for questions of easy to medium difficulty to go without a reply.

So here are some ideas I’ve had to improve the unanswered posts problem:

  • Push to get additional Alfrescans involved in the forums, including departments other than Engineering.
  • Continue to encourage our top posters and points-earners to keep doing what they are doing.
  • Identify community members to become moderators. Task moderators with ownership of the unanswered post problem for the forums they moderate. This doesn’t mean they have to answer every question–but it does mean if they see a post that is going unanswered they should own finding someone who can.
  • Continue to refine and enhance forums reporting. I can post whatever forums metrics and measures would help you, the community, identify areas that need the most help or that motivate you to level up your forums involvement. Just let me know what those are.

What are your thoughts on these ideas? What am I missing? Please give me your ideas in the comments.

By the way, while I’m on the subject, I want to congratulate and thank the Top 10 Forum Users by Number of Posts for January of this year:

1. mrogers* 74
2. amandaluniz_z 36
3. MikeH* 30
4. jpotts* 25
5. fuad_gafarov 22
6. zomurn 21
7. Andy* 19
8. RodrigoA 16
9. ddraper* 16
10. mitpatoliya 16

As noted by the asterisk (*) half of January’s Top 10 are Alfresco employees.

And if you are looking for specific forums that need the most help in terms of unanswered posts, here are the Top 10 Forums by Current Unanswered Post Count (as of 2/16):

1. Alfresco Share 210
2. Configuration 169
3. Alfresco Discussion 89
4. Alfresco Share Development 85
5. Installation 82
6. Repository Services 54
7. Workflow 47
8. Development Environment 44
9. Web Scripts 44
10. Alfresco Explorer 40

I’ll post the February numbers next week, and will continue to do so each month if you find them helpful or inspiring.

The Alfresco forums need your help

I was looking at the “unanswered posts” view in the Alfresco Forums today and was surprised to see it was 40 pages long. I know the growing list of unanswered posts has been a problem for quite a while because Nancy Garrity has mentioned it multiple times and I don’t know what the high water mark is for unanswered posts but 40 pages seems bad.

I admit that I haven’t been answering questions in the forums as often as I’d like and that’s bad too. So I took some time today to answer a few. You should do the same. Why should Russ Danner (503 posts) have all the fun?

Maybe instead of “follow fridays” on Twitter we should encourage “forum fridays” amongst the Alfresco community.