A Relevant Tale: How Google Killed Inktomi

By dbasch / May 3, 2012

On March 20th, 2000 Inktomi had a market capitalization of 25 billion dollars. As a relatively early employee, I was a multimillionaire on paper. Life was good. In the next year and a half the stock went down by 99.9%. In the end, Inktomi was acquired by Yahoo for 250M. What happened? Among other things, Google. Grab some popcorn and enjoy this story.

Inktomi was the #1 search engine in the world for a while. When I joined we had just won the Yahoo contract, and were serving search results for HotBot (there is still a search page there!) At first I worked on developing crawling and indexing tools written in C++. Our main goal at the time was to grow our index size, and at the same time to improve relevance. It became clear that as our document base grew, relevance would play a more important role. For ten million documents you may be able to filter out all but a handful of documents with a few well-chosen keywords. In that case any relevance algorithm would do; your desired result would be present in the one and only result page. You wouldn’t miss it. For a billion documents however, the handful would become hundreds or thousands. Without a good relevance algorithm, your desired result might be on page 17. You’d give up before getting to it.

At first we were using a classic tf-idf based model, enhanced by emphasizing certain features of pages or urls that correlated with “goodness.” For example, yahoo.com is probably more relevant to the query yahoo than yahoo.com/some/deep/page.html. We thought shorter urls were better. Of course this query was very popular, so spammers started creating pages stuffed with the word Yahoo. This was the beginning of an arms race that continues today. Back then we were the main target because we processed more searches than anyone else.

Enter The Google

Yahoo had been complaining to us about not being result #1 for yahoo for a while. We fixed that special case, but we couldn’t do the same for many other sites or pages. In 1999 Google was gaining popularity because they were solving exactly this problem. We didn’t perceive them as a threat yet, but we did realize that we had to do our own version of PageRank. I was assigned to that task.

My small contribution to improving our relevance was coming up with a simple formula to take into account the occurrences of words in links pointing to pages. The insight was realizing that this followed a power law: at the time Yahoo.com had about 1M instances of the word yahoo in links pointing to it. Nobody else came close. Other Yahoo properties had an order of magnitude less, and then came a long tail of other sites. I decided to use the logarithm of the count as a boost for the word in the document. This wasn’t as sophisticated as PageRank (we’d get to that later), but it was a huge improvement. Our relevance got much better over time as other people spent countless hours implementing our own link analysis algorithms. We had a clear mandate from the execs; our priorities at search were:

1) relevance

2) relevance

3) relevance

Doug Cook built a tool to quickly measure the relevance effects of algorithmic changes based on precomputed human judgments. For example: it was clear that Yahoo.com was the definitive result for the query “yahoo” so it would score a 10. Other Yahoo pages would be ok (perhaps a 5 or 6). Irrelevant pages stuffed with Yahoo-related keywords would be spam, and humans would give them a negative score if they showed up for that query. Given ten results and a query, we could instantly evaluate the goodness of the results based on the human rankings.

We had a sample corpus of links and queries for which we could run this test as often as we wanted, and compare ourselves against Google. We did this for months until it became clear that we were “as good as Google.” Our executives were happy.

Relevance Is Only So Relevant

Despite our relevance being so great, there was one huge red flag: engineers at Inktomi were starting to use Google as our search engine. Our executives tried to stop us from doing it, just like Bill Gates reportedly banned his kids from using Apple products.

I thought about why I was using Google myself, and I’m sure it’s obvious to everyone now: the experience was superior.

Inktomi didn’t control the front-end. We provided results via our API to our customers. This caused latency. In contrast, Google controlled the rendering speed of their results.
Inktomi didn’t have snippets or caching. Our execs claimed that we didn’t need caching because our crawling cycle was much shorter than Google’s. Instead of snippets, we had algorithmically-generated abstracts. Those abstracts were useless when you were looking for something like new ipad screen resolution. An abstract wouldn’t let you see that it’s 2048×1536, you’d have to click a result.

In short, Google had realized that a search engine wasn’t about finding ten links for you to click on. It was about satisfying a need for information. For us engineers who spent our day thinking about search, this was obvious. Unfortunately, we were unable to sell this to our executives. Doug built a clutter-free UI for internal use, but our execs didn’t want to build a destination search engine to compete with our customers. I still have an email in which I outlined a proposal to build a snippets and caching cluster, which was nixed because of costs.

Are there any lessons to be learned from this? For one, if you work at a company where everyone wants to use a competitor’s product instead of its own, be very worried. If I were an executive at such a company I would follow Yoda’s advice: “Do or do not. There is no try.” If you’re not willing to put in the effort to compete, you might as well cut your losses (like Google did with Buzz, for example).

Of course, this is not the whole story of how Inktomi failed. There was a complicated web of causation that involved timing, bubbles, lack of focus, departures of key executives, etc. That would be a book that might sell three or four copies at best 🙂

If you liked this post, I’m happy. Follow me somewhere, or don’t 🙂

Discuss on Hacker News if you please.

34 thoughts on “A Relevant Tale: How Google Killed Inktomi”

ccurrivan
May 3, 2012 at 7:50 pm

IIRC, we had a network flow algorithm that was similar to PageRank, but it wasn’t the most useful feature. I remember a point where we discovered that the anchor text feature implementation had a bug, and correcting it led to a huge improvement in relevance, replicating Google’s results in some cases. Since then I’ve believed that PageRank was a bit of a misdirect, and anchor text was Google’s real secret sauce in the beginning.Another big advantage Google had through their front end was that they kept it ad-free for a long time, while other sites were under pressure to monetize. IWon was the worst, blinkiest offender, but every search engine was cluttering their front page with flashing banner ads. When Google finally did introduce ads, they were small bits of text that didn’t try to distract you from your search.

Reply
elladstep
May 3, 2012 at 8:03 pm

Interesting writeup – particularly the bit about management trying to ban you from using Google’s products. That’s when you know the ship is sinking. Google seems invincible right now, but I gotta say that they’re on the edge of not mattering as much either – they have a weakness in the most important thing in the decade, social media. I think that the idea of getting this structured data is what is so appealing to Google and why they’re putting so many resources behind Google+. Without access to structured social data, Google can’t continue to refine their search experience further. With access to this social data, Facebook has a legitimate shot of competing in search by using this social data to fuel their search experience. Facebook has captured the attention of the popular culture in addition to businesses: look at how many active users Facebook has, look at how many big brands are promoting their Facebook pages in their TV commercials, look at how many companies their are at http://www.buyfacebookfansreviews.com that do nothing other than promote Facebook pages…Facebook is really on the right track here even if they deserve a bit of criticism over some of their features. I think that the reason that Facebook is worth well in excess of their $100 billion IPO is that they have the potential to dominate search, ecommerce, and other major fields. But this market is so big and valuable that I think there’s room for other players to get involved here and take some risks and chances that big companies like Facebook and Google won’t take.

Reply
Diego Basch
May 3, 2012 at 8:36 pm

@ccurrivan Hi Chris! I agree with you, anchor text was the breakthrough improvement. PageRank made it harder to game, and then the arms race continued.iWon… I remember that :)@elladstep: I wonder what’s the internal policy at Google about Facebook usage. I’m sure FB doesn’t worry much about their employees using G+.

Reply
soamwork
May 3, 2012 at 8:39 pm

Oh lord. This post bring back some very painful memories! In addition to your great summary, I would note that around that time, the bulk of Inktomi engineering was working on the Traffic Server, some on enterprise search and the rest on web search. I believe G’s engineering workforce very quickly outnumbered those in Inktomi exclusively devoted to web search. One can make the argument that perhaps shifting personnel and resources to web search early on might have made a big difference to the outcome, especially as the Traffic Server earnings collapsed eventually but hindsight is always 20/20, isn’t it? 🙂

Reply
wunderwood
May 4, 2012 at 12:01 am

I disagree about not using the competitors stuff. I already know how my stuff works. The only way I can find out about them is by using theirs. When I see everyone using only their own products, I think “this place is like Microsoft”.Most of the big search engines discovered anchor text around the same time. I remember seeing Infoseek ESP demo’ed the first time. I think snippets are really about explaining the engine’s decisions to the user and building trust. If I see my words, but it is clear that the page is really about something else, I might forgive the silly computer. But if I see some wonderful abstract that says nothing about my search, the engine just looks wrong.I came in with the Ultraseek acquisition and rode the stock from $120 down to $0.24. We did sell a chunk at $60, but the options were under water after that, especially when we got into “shares per latte” terrritory.

Reply
Diego Basch
May 4, 2012 at 12:22 am

Hi Walter! You make a good point. For us on web search, we mostly gave up on using the portals we powered (Yahoo, Hotbot, MSN, AOL, etc.) because of the annoying clutter. We used Google not so much to see how it worked, but because it was clearly more useful.FWIW, I often tell people that I believe Ultraseek was Inktomi’s best acquisition. It was the first time I saw production-quality software written in Python. You guys were pioneers! 🙂

Reply
vofitserov
May 4, 2012 at 2:46 am

Great start for the book, Diego! Inktomi missed production deployment of the release that used frequency in anchors by 1 month. In July 2000 Yahoo switched to Google as a search backend. We had a prototype of anchortext only search engine running in November 1999 with index of 25M URLs. It took good 9 month to push this improvement into production, and it was too late. Google won the biggest search account in history.

Reply
Orangwutang
May 4, 2012 at 4:22 am

I think you underestimate how interested people might be in a book about your experiences. Sounds very interesting to me. Make it a Kickstarter project and gauge interest and if it’s there, do it! Nice article.

Reply
Lana
May 4, 2012 at 11:54 am

Very interesting read. It seems like the Inktomi managers did let it fall once their engineers started using Google search. That says a lot. F.e., while I’m still comfortable with Google services, nowadays I more often use the new DuckDuckGo just because it makes it a lot easier to get very quick super relevant results.

Reply
mikedowling
May 4, 2012 at 12:10 pm

Just curious, the link for ‘new ipad screen resolution’ in this post is: https://www.google.com/search?ix=aca&sourceid=chrome&ie=UTF-8&q=domino+pizza+…&gs_nf=1&tok=DBTJEp2_3oW1F2ietDMecQ&pq=ipad%20screen%20resolution&cp=1&gs_id=5w&xhr=t&q=new+ipad+screen+resolution&pf=p&safe=off&sclient=psy-ab&oq=nipad+screen+resolution&aq=0&aqi=g2g-b2&aql=&gs_l=&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=8316e992ae23057e&ix=aca&biw=1363&bih=647.Why does it include domino+pizza+phone+number?Thx.

Reply
Diego Basch
May 4, 2012 at 12:41 pm

Nice catch! Because it's the first example I tried when writing the post, but the results I got didn't illustrate the point so well. I guess that because I disabled history with Google (or for some other reason), they are sending my previous query as a parameter in the GET request.

Reply
storagechat
May 4, 2012 at 1:12 pm

Reply
nemekn
May 4, 2012 at 1:32 pm

Make that five copies – I’d buy one!

Reply
kosei
May 4, 2012 at 2:34 pm

Really interesting. I and my friends in Japan will buy if that would be a book! I had evaluated web search engine as a product manager of FE side. I remember I had asked Doug about some issues including anchor text and snippet. I didn’t know such a backgrounds.

Reply
Troy Toman
May 4, 2012 at 2:34 pm

Diego,Thanks for capturing the train of thoughts that seem to run through my head almost daily. Lots of lessons to learn from that experience. Although it is easier to connect the dots in the rearview mirror that it was looking forward at the time, There were some clear lessons about not forgetting the actual end user (which is not always your customer), using a single metric as a proxy for user experience, obsessing about a competitor and trying to get big instead of great.

Reply
Diego Basch
May 4, 2012 at 3:41 pm

Thanks Troy. Your comment sums it up really well.

Reply
netik
May 4, 2012 at 4:28 pm

I also worked at Inktomi, and I think I remember you.You left out the part where Inktomi execs made the net eng team block access to GOOG for awhile at the office, and how Yahoo skillfully destroyed Inktomi by switching off Inktomi to Google, and then strung Inktomi along for months saying they’d return to Inktomi if only Inktomi would add certain features.Inktomi wasted enormous amounts of engineering effort while other clients like MSIE switched to different search engines. Yahoo continued to string Inktomi along until the stock price fell far enough for a takeover, which is exactly what they ended up doing.

Reply
bruinnitsud
May 4, 2012 at 9:01 pm

Portaquest FTW

Reply
evmkv
May 4, 2012 at 9:29 pm

I was looking at all this from a distance, being in Traffic Server part of Inktomi then so, my memory may not be 100% accurate, but I still remember “they cannot scale on Linux” and “they compete with their own customers” all hands slogans. Now it all looks like The Comedy of Errors or perhaps – tragedy to me. I think, with its policy of not competing with their customers and therefore banning its own front end, Inktomi management deprived its search of one of the most valuable assets of a search quality improvement – users feedback. As a result, Inktomi had to more heavily rely on a group of editors to provide this feedback which was obviously a much higher quality, but not easily scalable as well as expensive.PageRank & anchor text, IMHO, are parts of this user feedback, the parts which are less noisy and provided by a much smaller group of much more sophisticated users, but on a much smaller scale.So, while Inktomi management was sticking to its guns of non competing with their clients, the most important clients – Yahoo and AOL were apparently not much worried about it and switched to Google, pretty much putting Google on the map for an average user and allegedly committing an act of slow suicide. Ironically, MSN did not abandon Inktomi until the very end, I suspect they did worry about competition from Google even then.Of course, it’s much easier to judge history ten+ years later. Sponsored search was then in infancy, paid inclusion seemed to be the only obvious alternative to providing search back-end service to large portals. But the morale the story probably is: do everything to provide your end users with the best service/product and a magnetization opportunity will come sooner or later. Oh, and don’t invest too much into real estate and avoid buying 1.2 billion start-ups at the peak of a bubble 🙂

Reply
nesmel
May 5, 2012 at 12:33 am

Reply
andysalo
May 5, 2012 at 4:01 am

I agree with your post.Very enlightening to see what search engineering thought of the situation at the time. I wrote a blog post about it myself a while back: http://www.andysalo.com/2010/05/18/value-prop-vs-executive-team-a-common-vc-m… I was on the Traffic Server side of the house, first in sales then PM. I remember the last year or so at Inktomi as things were falling off a cliff. Painful.At the end of the day Google had a built a better mouse trap. As you say, the user experience was better. Completely agree that the eye was taken off the ball. Lessons learned. Makes you wonder “if only” we had a user facing, clutter free website, and some of the enhancements you highlighted, what would have happened.

Reply
Samuel Lavoie
May 5, 2012 at 4:12 am

I did buy a copy for sure. Sums up well how we need to follow our guts and how much user experience is important!Thank you for sharing this 🙂

Reply
Samuel Lavoie
May 5, 2012 at 4:15 am

Reply
MarionSmithIII
May 5, 2012 at 1:09 pm

Great history lesson Diego. Like Troy, I think about this stuff daily. There are far too many lessons to be learned from the the events following March 20, 2000, many of which were born long before that. As a business guy, I am amused by reading your summary of the technical issues, and interaction of those with problem solving on the business side. Those issues have been less in the forefront of my recollections (for no reason other than my own focus) than two key business decisions that seemed, in hindsight, to be horribly wrong. First, was the decision not to permit paid-inclusion results because of a belief (which I think was genuine at the time) that doing so would turn off users (not our customers, but the ones searching). And second, not to buy Google early, when they demonstrated traction and INKT was strong currency. Definitely, there was a very complicated web of factors, which so many have pointed out above. Thanks for sharing the story. It is a great read, if only for us alumni.PS: I thought about writing that book a number of years back but thought you were right about how many copies would sell.

Reply
Abraham Williams
May 6, 2012 at 1:57 pm

Reply
supremelydisappointing
May 6, 2012 at 3:20 pm

Reply
Seyi Taylor
May 7, 2012 at 10:10 pm

Reply
manningj
May 11, 2012 at 4:17 pm

not really relevant, but going down by 99.9% would be a going down by a factor of 1000, right? Seems like 25B to 250M is a factor of 100, so 99.0%?

Reply
wunderwood
May 11, 2012 at 4:21 pm

The stock peaked at $240 and bottomed out at $0.24. It recovered to about $1.60 at the end.

Reply
Diego Basch
May 11, 2012 at 4:23 pm

The bottom was 24c/share down from $241.5 at the top, 99.9% indeed. It recovered a bit:"In 2002, Inktomi board brought in turnaround expert and long term media investor Keyur Patel and restructured the organization to focus back on search and divest from non core assets. This move turned out to be brilliant and led ultimately to be acquired by Yahoo! in 2002 for $1.63 a share (or $235 million)." http://en.wikipedia.org/wiki/Inktomi

Reply
Chill
March 27, 2013 at 12:09 pm

I worked at Inkomi after it purchased Fastforward Networks which was a Live Media startup that was making money but after about 2 years of being at Inkomi and seeing the writering on the wall I left and could not believe poeple stayed, I have never worked for a company that requried me to go to a full day meeting on the who and what the company is and still come out wondering what the conpmay does other than just buy companies and have no idea what to do with them.

Reply
badshot
August 19, 2013 at 9:59 am

Great summary of the problems in search. I was in the East coast office at the time and knew we were in trouble when I heard Peterschmidt say “…And I’m tired of hearing about how great Google is…”, but then couldn’t articulate either how we were actually better or how we were going to beat them.

Reply
Ron Verheijen
August 21, 2015 at 7:58 am

Ow, saw this only now – good times indeed back then. From my perspective, the company was betting on building virtual network infrastructures and content delivery platforms on top of physical network layers – caching was going to be the cash. Sales folks got incented strongly to sell those products, we in search suffered the “we-have-all-the-customers-we-can-get-why-should-we-invest” attitude. The Internet bust around 2000/2001 was enough so that physical networks could handle all the traffic so that Inktomi network products were no longer required.
And by that time Inktomi had passed up the opportunity to acquire Google (yep!) and GoTo (yep again).
No to say I did not enjoy booting Google out of a couple of customers (big UK media, web.de,…..).

Reply
Xinbenlv
March 26, 2019 at 1:30 am

Hello from 7 years later folks, great writing, thank you for sharing!

Reply

34 thoughts on “A Relevant Tale: How Google Killed Inktomi”

Leave a Comment Cancel Reply