jeffbrand.net

tagline in progress

The Twitter Firehouse In Action – Gauging the Automated Response

In case you thought no one read your tweets…

Following a tweet with a link that I posted earlier today, I checked my server logs to see what kind of response I got. I don’t have many followers.. the custom link shortening site gets little-to-no traffic.. it’s a sleepy Sunday and I don’t expect many people to be trolling for tech tweets.. So what did I find?

  • 2 HEAD requests: one from Twitterbot from an address inside of their data center, and another from Kosmix “Voyager”
  • 1 pair of requests from Google: A query to robots.txt to query indexing rules for the site, and another for the content.
  • 2 pairs of requests from Yahoo: Same pattern, robots.txt and then the actual content
  • 2 single requests from addresses owned by Microsoft.
  • 1 request from Topsy.com
  • 1 request from Tweetmemebot

All of these hits arrived within a few seconds of my post. In general, the requests made sense: HEAD requests ensure that the link exists, robots.txt verifies the site policies on crawling and indexing, and each service did a reasonable job of identifying themselves. However, the traffic from Microsoft was suspiciously generic, reporting a User-Agent string of “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0).”

As a fairly insignificant blip on the twitterscape, I find this traffic telling. The big boys – Google, Yahoo, Microsoft – are sensible customers for the Twitter Firehose and understandably want to grab all traffic whenever possible. I know very little about the others. I’m also curious about how the volume of the automated response changes with one’s reputation on Twitter. Specifically, what happens when a poster/post grows to a level that passes the filters governing the sampling stream API method, available to the average developer?

This is definitely a topic to revisit and explore in greater depth.


Categorised as: Internet


Comments are closed.