The Twitter Firehouse In Action – Gauging the Automated Response

December 5, 2010

In case you thought no one read your tweets…

Following a tweet with a link that I posted earlier today, I checked my server logs to see what kind of response I got. I don’t have many followers.. the custom link shortening site gets little-to-no traffic.. it’s a sleepy Sunday and I don’t expect many people to be trolling for tech tweets.. So what did I find?

2 HEAD requests: one from Twitterbot from an address inside of their data center, and another from Kosmix “Voyager”
1 pair of requests from Google: A query to robots.txt to query indexing rules for the site, and another for the content.
2 pairs of requests from Yahoo: Same pattern, robots.txt and then the actual content
2 single requests from addresses owned by Microsoft.
1 request from Topsy.com
1 request from Tweetmemebot

All of these hits arrived within a few seconds of my post. In general, the requests made sense: HEAD requests ensure that the link exists, robots.txt verifies the site policies on crawling and indexing, and each service did a reasonable job of identifying themselves. However, the traffic from Microsoft was suspiciously generic, reporting a User-Agent string of “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0).”

As a fairly insignificant blip on the twitterscape, I find this traffic telling. The big boys – Google, Yahoo, Microsoft – are sensible customers for the Twitter Firehose and understandably want to grab all traffic whenever possible. I know very little about the others. I’m also curious about how the volume of the automated response changes with one’s reputation on Twitter. Specifically, what happens when a poster/post grows to a level that passes the filters governing the sampling stream API method, available to the average developer?

This is definitely a topic to revisit and explore in greater depth.

Tagged with: google, twitter

Categorised as: Internet