Open Source and Big Data: Two Things I Love About LinkedIn.

Today is InDay at LinkedIn. We are encouraged to spend one day each month learning, inspiring others, and sharing knowledge with the community. I’ve been at LinkedIn for four months now, and this day gives me the perfect excuse to write a post about two things I love about this place: open source code and big data.

Open source: I think LinkedIn has contributed some great open-source products to the world. My favorites so far are:

  • Apache Kafka: a distributed publish-subscribe messaging system. The design document is a great read, I believe anyone working with distributed systems should check it out. [btw, congratulations Jay on the new baby!]
  • Krati: a simple persistent data store with very low latency and high throughput. It’s similar to BDB but written natively in Java. It’s hash-based, which gives it better random access performance than BDB’s BTree structure. Jingwei Wu is one of the best systems programmers I’ve met, and I’m so happy that he’s on our team.
  • Project Voldemort: a Java implementation of the Amazon Dynamo paper. It’s some of the cleanest code I’ve seen, and a great place to start if you want to learn about distributed key-value stores. Also, you should follow Alex Feinberg on Twitter and Quora.
  • IndexTank, of course :)

There are too many to count at this point (SenseiDB, Zoie, Bobo, etc.). Take a look at our group’s page.

Big Data: we have a treasure chest of people profiles and activity, and a beefy Hadoop cluster to mine what we can out of it. More importantly, we have stellar data scientists like Daniel Tunkelang and his team. They perform daily acts of wizardry to improve our products, as well find out unexpected insights about the professional world. When I got to play with the Hadoop cluster a couple of months ago and try out a little idea, it was nerd Nirvana for me. Of course I have my hands full with search, but one of my goals for the next few months is to write a few more Pig scripts.

Of course LinkedIn is hiring like there’s no tomorrow; we need to understand that data even better, and we have a laundry list of apps we want to build on top of it. Check out some jobs.

BTW, this is what we had for lunch yesterday :)
Sushi


4 thoughts on “Open Source and Big Data: Two Things I Love About LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *