For experimenting, I wanted a source of data. Lots of data. For my application it would be trivial to generate such data, but I wanted lots of randomness in it, so I tapped into a bit of the Twitter firehose. This firehose is the full feed of tweets (somewhere about 50 million (!) tweets per day) to which only a limited number of clients have access. As part of the firehose, there is an API called the Streaming API which allows anybody access to a fraction of the stream, so I tapped into that. (I jokingly said yesterday that I may be responsible for some of the fail whales of the day.) The stream we mere mortals have access to provides me with about 50.000 statuses (tweets) per hour, which adds up after a day or two. Here is a shot of the statuses streaming past: The individual tweets are provided in JSON format, which suits me well, and I’ll turn the tap off in an hour a day or two.

Database, CLI, twitter, and data :: 27 Apr 2010 :: e-mail


blog comments powered by Disqus