Twitter is fun, and you can play with it. In spite of the multitude of Twitter clients, some of which can keep some sort of archive of what I’ve tweeted, I want to collect my tweets and archive them independently of the client I’m using.
I’ve knocked up a small Perl program which uses the Twitter API to grab my so-called time line in JSON format. It then stores the individual tweets in a small SQLite database – nice and lightweight.
Start off by creating the database, which I’ve called tweets.db
:
$ sqlite3 tweets.db < tweets.sql
The database table is trivial; its definition is in tweets.sql
:
CREATE TABLE tweets (id INTEGER,
tstamp DATE,
text VARCHAR(140),
PRIMARY KEY(id));
Now comes the small program, which retrieves my time line from Twitter, parses through the JSON format and inserts my latest tweets into the database.
#!/usr/bin/perl
use strict;
use JSON;
use DBI;
use LWP::UserAgent;
use Date::Parse;
my $username = 'jpmens';
my $dbh = DBI->connect("dbi:SQLite:dbname=tweets.db",
"", "", {AutoCommit => 0,
PrintError => 1} )
or die $DBI::errstr;
my $sth = $dbh->prepare("INSERT INTO tweets (id, tstamp, text) VALUES (?, ?, ?)");
# find last used twitter ID in table
my @row = $dbh->selectrow_array("SELECT MAX(id) FROM tweets");
my $lastid = (defined(@row[0])) ? @row[0] : 1;
my $url = "https://twitter.com/statuses/user_timeline/${username}.json?since_id=${lastid}";
my $ua = LWP::UserAgent->new;
$ua->agent("TweeterArchiver/0.1 " . $ua->agent);
$ua->env_proxy;
my $res = $ua->get($url);
if (! $res->is_success) {
die "Cannot retrieve URL ($url): " . $res->status_line . "\n";
}
my $timeline = from_json($res->content); # JSON string
foreach my $tweet (@$timeline) {
print $tweet->{id} . " " . $tweet->{text} . "\n";
$sth->execute($tweet->{id},
str2time($tweet->{created_at}),
$tweet->{text}) or die $DBI::errstr;
}
$dbh->commit();
$sth->finish();
undef $sth;
$dbh->disconnect();
undef $dbh;
Each time the program runs, it determines the last tweet identifier I received, and asks Twitter to return tweets that were submitted since then. (I’m being nice to Twitter in doing so.)
Tweets are stored in the database table, and I can do something with those sometime later, for example:
SELECT time(tstamp, 'unixepoch'), text
FROM tweets
WHERE date(tstamp, 'unixepoch') = date('now');
08:50:37|@nmeth Ooohh, lucky you! Our postman hasn't rung twice yet...
09:07:48|une pensée "philosophique"... http://post.ly/3IS6
13:07:09|The Missus doesn't want an iPhone. Grounds for a divorce?
13:45:50|@vowe Cold drink nowhere to be seen. On your tummy behind the c't ? ; -)
18:16:57|On my blog: New look, old feel http://bit.ly/PDTgs
18:53:54|Couscous. Good.
Remember, that you’re not allowed to invoke this program terribly often: Twitter rate limits access to its servers. I run it once an hour from cron.