Open-source Something Often

If you write code for a living, when was the last time you released something as open source? If you can’t remember, I’d hold that against you in an interview. Why? Assuming you take pride in your work (if you don’t… well), open-source code is an incentive to:

  • Make sure the code is not horrendously embarrassing.
  • Verify that a random person can check it out and make it work.
  • Explain to others why (and maybe how) your code works.

You may be a star coder at some top-notch Silicon Valley darling, and you may work on code that has been working flawlessly for years. Whether you like it or not, the more obscure and undocumented your code, the more the company depends on you. Your peers may quickly review your code every so often (ship it! ship it good!), but that doesn’t mean anybody else knows what’s going on. Some companies like LinkedIn understand this, and make it known that your code could (and probably will) be open-sourced one of these days.

I’m writing this post because this weekend made me remember the three bullet points I mentioned, when I open-sourced my code to generate word clouds from Twitter searches.

I wrote this stuff as I was learning Clojure last year. I wanted to create word clouds such as the ones on Wordle. I discovered a great Processing-based library called WordCram, and I wrote some code around it to fetch and render text from a variety of sources: Twitter timelines, searches, the app.net global timeline, rss feeds, etc. Here’s an example of a word cloud for snowden early this morning:

Snowden

I thought it would be really easy to release this as open source, but it turned out to be a fair amount of work for what’s barely over 100 lines of Clojure. Here are some of the issues:

1) As I decided to release the code, I tested it against Twitter and it didn’t work. Why? Because Twitter had deprecated the search api supported by clojure-twitter, based on pagination. I had to replace it with twitter-api, and adapt my code to work with timelines.

2) I realized that my config.clj file had my private Twitter app credentials. I had to add it to .gitignore so I don’t commit it by mistake, and I had to do it before adding the repo to github because otherwise it would be part of the history. I had to create a dummy config.sample file, and explain in my README.md how to obtain/insert your credentials.

3) I still have to document my code and make it more idiomatic, but I don’t care that much about it because I’m not looking for a job as a coder. I’ll do it at some point, or maybe someone will tell me how much I suck in a pull request 🙂

4) I thought I was done, so I created the github repo. I did a fresh check out, and it worked. I tweeted about it, and I ask a friend to see if it worked for him. Of course it did not work. Why? This is where the fun begins.

I had created the project with leiningen 1.7, and I’d had to create a local maven repository because WordCram and other needed libraries are not in any public ones. I’d followed the instructions on this post from late 2011. I had since upgraded to leiningen 2.1.3, for which this recipe doesn’t work anymore. The problem is that the .m2 directory under my home already had all the dependencies it needed, so a fresh build did not need to fetch anything from the maven repo. Sure enough, I moved .m2 and then everything broke.

A search on Google took me here, which was the start of the solution. That almost worked, except for the fact that it created a slightly different repository structure than the one leiningen expected for my four artifacts. I had to recreate the maven repo with a script that gave a different groupId to each artifact, and then my friend sgrove was able to make the code compile and run.

I still wasn’t done. The first query he tried was somewhat obscure, and a call with 0 results broke my program (not enough arguments to the reduce function).

Moral of the story: I’ve been coding for 30 years, and look at all the stuff I had to learn in order to open source what’s essentially a puny script. Humbling and inspiring at the same time.

What have you open-sourced for me lately?