This weekend I fetched 1M random user profiles from Twitter, just because. I figured it would be enough to answer some interesting questions about Twitter. Here’s what I did, along with some conclusions.
As you may know, each Twitter user is assigned a numeric id. These ids started at a very low number and are always increasing. The highest Twitter user id when I started the experiment was around 637M (found by trial and error). I figured there would be gaps in user ids mostly because of massive deletions of spammer accounts, and a quick sample estimated the gaps to be on the order of 20%. So I generated 1.25M unique user ids in the range 0-637M, and tried to fetch the profile details for them.
The Twitter API allows requesting 100 user profiles with one call, so that means I had to issue 12,500 calls. Twitter limits API requests from a given IP address to 150 per hour (in practice sometimes less). I had to use a few addresses to fetch all the data over the weekend (tip that came in handy: some mobile carriers refresh your IP every time you go in and out of airplane mode).
After fetching the 12,500 batches I was left with 1,039,556 Twitter profiles. This means that there must exist approximately 530 million Twitter accounts (*): 83% of 637M. Of course, this number doesn’t say a lot. Let’s look into these accounts in more detail.
Obligatory chart: signups over time (July of 2012 incomplete, could only fetch accounts created before July 18th for some reason).
I left out the Paleozoic era of Twitter (2006 and 2007) because it was visually insignificant compared to the Great Expansion of 2009.
The Tweets and the Tweet-nots
Approximately half of the accounts have tweeted at least once. The other half may be lurkers, or an example of what would happen if domain names were free: lots of parked ones. Still, the number of accounts that have never tweeted seems surprisingly high. Furthermore, 16% of all accounts (over 80 million) have no followers, no friends and no tweets. Hey Twitter, how about releasing some of those to the wild?
The average Twitter user has tweeted 307 times. That’s a total of 163 billion tweets since the dawn of Twitter [this space reserved for a snarky comment about that amount of collective wisdom].
It may be more meaningful to count only users who tweeted at least once. For those, the average is 520 tweets.
Graph characteristics
The distribution of followers per account is a power law (duh). The most followed account has tens of millions of followers, the median account has 1. The average user follows (or is followed by) 51 people. Of course this average is pretty meaningless, but it means that the Twitter “follow” graph has about 33 billion edges.
Followers and friends
For all accounts: median followers = 1, median friends = 5 (average is 51 for both).
For the 272M accounts that tweeted at least once: median followers = 4 and median friends = 15 (average is 85 for both).
For the 80M accounts that tweeted in the past month (these are what I’d call active users by the way): median followers = 31 and median friends = 72 (averages, 235 and 188).
Usernames
In early Twitter times (i.e. 2007), the average user name was eight letters long. It increased to 9 in mid-2008, and to 10 in 2010. That’s the current average, even though for most months of 2012 new accounts had an average of 11.
Ok, enough numbers. What does this all mean?
To me, the most telling number is the people who actually tweet at least once a month. 80M is a respectable number, but it’s still a tiny fraction of the internet. Of course, the elites of the world are overrepresented on Twitter; it’s a free megaphone for them. They are also the prime audience for many advertisers, but obviously not for all.
Now, here’s an interesting off-the-cuff hypothesis: what if the ratio between existing and active accounts on Twitter were not very different from the one on Facebook? Facebook’s definition of an active user is quite generous: anyone who interacts with the site in any conceivable way. That would mean that even though there are close to a billion “active users” on Facebook, perhaps between 100M and 200M are people who actually spend time posting and consuming content on Facebook.com.
Ok, enough speculation. Let me know if there are any other numbers you’d like me to extract from the data, or if you see anything wrong with my methodology. Here’s my dataset if you’d like to run some experiments of your own [update 7/31/2012 12 pm: dataset removed per Twitter’s request]
On a final note: I apologize, but at this point I have to remind you to follow me on Twitter 🙂 (well, technically Posterous is Twitter too).
Discussion of this post on Hacker News
(*) Edit: simonw pointed out that because of the Snowflake update, the 530M estimate could be off by a few percentage points. I believe it’s close to noise, but take the figure with a grain of salt (e.g. 500M to 550M). A more accurate experiment would require generating ids differently for the period after October of 2011 because there are much larger gaps in id numbers since then.
Hi Diego,the link to your dataset is down. Would it be possible for you to help me with the following data:-total number of users based in Italy -number of users based in Italy who have tweeted at least once-number of users based in Italy who have neither tweeted nor follow anybody nor have followers-number of users who follow comedian turned political figure Beppe Grillo (see below) who have tweeted at least once-number of users who follow Beppe Grillo who have neither tweeted nor follow anybody nor have followers.Many thanks!MassimoBeppe Grillohttps://twitter.com/beppe_grillo
I just ran across this doing some research for an upcoming book. Great job on methodology and analysis. I would really like to see if the numbers have changed much in a year. Also, I think there is something to be said that perhaps the numbers that are reported by Twitter and other media outlets is highly inflated when you look at the actual numbers of fake accounts, abandoned accounts, inactive accounts, and multiple accounts. I believe your data begins to support the additional notion that it is active but perhaps by a smaller percentage of people, or that it is active in smaller Twitter circles.
Hello, thanks for the very interesting work. Bit confused by what you mean by “followers” and “friends”? How are you defining friends? Are they the people the user follows?
also this: “For the 80M accounts that tweeted in the past month (these are what I’d call active users by the way): median followers = 31 and median friends = 72 (averages, 235 and 188).”
Should those averages be the other way around?