Twitter is fun, and you can play with it. In spite of the multitude of Twitter clients, some of which can keep some sort of archive of what I've tweeted, I want to collect my tweets and archive them independently of the client I'm using.

I've knocked up a small Perl program which uses the Twitter API to grab my so-called time line in JSON format. It then stores the individual tweets in a small SQLite database -- nice and lightweight.

Start off by creating the database, which I've called tweets.db:

$ sqlite3 tweets.db < tweets.sql

The database table is trivial; its definition is in tweets.sql:

CREATE TABLE tweets (id INTEGER,
         tstamp DATE,
         text VARCHAR(140),
         PRIMARY KEY(id));

Now comes the small program, which retrieves my time line from Twitter, parses through the JSON format and inserts my latest tweets into the database.

#!/usr/bin/perl
    
    use strict;
    use JSON;
    use DBI;
    use LWP::UserAgent;
    use Date::Parse;
    
    my $username = 'jpmens';
    
    my $dbh = DBI->connect("dbi:SQLite:dbname=tweets.db",
          "", "", {AutoCommit => 0,
               PrintError => 1} )
        or die $DBI::errstr;
    
    my $sth = $dbh->prepare("INSERT INTO tweets (id, tstamp, text) VALUES (?, ?, ?)");
    
    # find last used twitter ID in table
    my @row = $dbh->selectrow_array("SELECT MAX(id) FROM tweets");
    my $lastid = (defined(@row[0])) ? @row[0] : 1;
    
    my $url = "https://twitter.com/statuses/user_timeline/${username}.json?since_id=${lastid}";
    
    my $ua = LWP::UserAgent->new;
    $ua->agent("TweeterArchiver/0.1 " . $ua->agent);
    $ua->env_proxy;
    
    my $res = $ua->get($url);
    
    if (! $res->is_success) {
        die "Cannot retrieve URL ($url): " . $res->status_line . "\n";
    }
    
    my $timeline = from_json($res->content);    # JSON string
    
    foreach my $tweet (@$timeline) {
        print $tweet->{id} . " " . $tweet->{text} . "\n";
    
        $sth->execute($tweet->{id},
            str2time($tweet->{created_at}),
            $tweet->{text}) or die $DBI::errstr;
    }
    
    $dbh->commit();
    $sth->finish();
    undef $sth;
    $dbh->disconnect();
    undef $dbh;

Each time the program runs, it determines the last tweet identifier I received, and asks Twitter to return tweets that were submitted since then. (I'm being nice to Twitter in doing so.)

Tweets are stored in the database table, and I can do something with those sometime later, for example:

SELECT  time(tstamp, 'unixepoch'), text 
        FROM tweets
        WHERE date(tstamp, 'unixepoch') = date('now');
08:50:37|@nmeth Ooohh, lucky you! Our postman hasn't rung twice yet...
    09:07:48|une pensée "philosophique"... http://post.ly/3IS6
    13:07:09|The Missus doesn't want an iPhone. Grounds for a divorce?
    13:45:50|@vowe Cold drink nowhere to be seen. On your tummy behind the c't ? ; -)
    18:16:57|On my blog: New look, old feel http://bit.ly/PDTgs
    18:53:54|Couscous. Good.

Remember, that you're not allowed to invoke this program terribly often: Twitter rate limits access to its servers. I run it once an hour from cron.

Flattr this
Database, CLI, twitter, sqlite, and Perl :: 29 Aug 2009 :: e-mail

Comments

blog comments powered by Disqus