The Twitter Firehouse In Action – Gauging the Automated Response
In case you thought no one read your tweets…
Following a tweet with a link that I posted earlier today, I checked my server logs to see what kind of response I got. I don’t have many followers.. the custom link shortening site gets little-to-no traffic.. it’s a sleepy Sunday and I don’t expect many people to be trolling for tech tweets.. So what did I find?
- 2 HEAD requests: one from Twitterbot from an address inside of their data center, and another from Kosmix “Voyager”
- 1 pair of requests from Google: A query to robots.txt to query indexing rules for the site, and another for the content.
- 2 pairs of requests from Yahoo: Same pattern, robots.txt and then the actual content
- 2 single requests from addresses owned by Microsoft.
- 1 request from Topsy.com
- 1 request from Tweetmemebot
All of these hits arrived within a few seconds of my post. In general, the requests made sense: HEAD requests ensure that the link exists, robots.txt verifies the site policies on crawling and indexing, and each service did a reasonable job of identifying themselves. However, the traffic from Microsoft was suspiciously generic, reporting a User-Agent string of “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0).”
As a fairly insignificant blip on the twitterscape, I find this traffic telling. The big boys – Google, Yahoo, Microsoft – are sensible customers for the Twitter Firehose and understandably want to grab all traffic whenever possible. I know very little about the others. I’m also curious about how the volume of the automated response changes with one’s reputation on Twitter. Specifically, what happens when a poster/post grows to a level that passes the filters governing the sampling stream API method, available to the average developer?
This is definitely a topic to revisit and explore in greater depth.
Categorised as: Internet