I mentioned a bee, and I still hear it buzzing, though it might just the tinnitus I’ve had for a dozen years… Quite annoying.

When I find a repository on GitHub containing something I find interesting or think I’ll use, I will typically star it. Originally I starred to show the owner that I appreciate her or his work, and I still do it for that reason; I think of it as a hat tip. Then it became a method of “bookmarking” a GitHub repository. The only question I never tried to answer was “how do I find those bookmarks?”

The answer is actually quite simple once I dig into the GitHub API, and I’m fortunate in being able to use PyGithub which simplifies things greatly for me.

The following program produces a JSON file with all repositories I’ve starred and details of them:

#!/usr/bin/env python -B

# https://pygithub.readthedocs.io/en/latest/introduction.html
from github import Github # https://github.com/PyGithub/PyGithub
import json
import os

g = Github(open(os.path.expanduser("~/.gist")).read())

starred = []
for repo in g.get_user().get_starred():

    # I find stars some of my own repositories, and I don't think
    # I actually did that; artefact of prior GitHub practices?
    if repo.owner.login != "jpmens":
        data = {
            "name"          : repo.name,
            "owner"         : repo.owner.login,
            "full_name"     : repo.full_name,
            "clone_url"     : repo.clone_url,
        }
        if repo.owner.name:
            data["owner_name"] = repo.owner.name
        if repo.owner.email:
            data["owner_email"] = repo.owner.email
        if repo.owner.avatar_url:
            data["owner_avatar_url"] = repo.owner.avatar_url

        if repo.description:
            data["description"] = repo.description
        if repo.homepage:
            data["homepage"] = repo.homepage

        topics = repo.get_topics() # removing this speeds up the program
        if len(topics) != 0:
            data["topics"] = topics

        starred.append(data)

with open("starred.json", "w") as f:
    f.write(json.dumps(starred, indent=4))

The program runs for a couple of minutes because I use an additional API call to obtain more information for my archive: the topics of the particular repository. The result is (I hope) something I can use long-term to find what I’m looking for. Thinking aloud: I have the name, its owner, the owner’s full name (sometimes easier to remember than a cryptic username), the topics, and a description, hoping the repository owner’s taken the trouble to set those.

[
    {
        "name": "haricot",
        "owner": "catwell",
        "full_name": "catwell/haricot",
        "clone_url": "https://github.com/catwell/haricot.git",
        "owner_name": "Pierre Chapuis",
        "owner_avatar_url": "https://avatars1.githubusercontent.com/u/221332?v=4",
        "description": "Beanstalk client for Lua",
        "topics": [
            "lua",
            "beanstalkd"
        ]
    },
    ...
]

I can then have a program run through that list and create clones of the clone_urls:

#!/bin/sh

dir="/path/to/rep/ositories"

jq -r '.[]| "\(.owner)-\(.name) \(.clone_url)"' < starred.json |
  awk '{ gsub(/[ \/]/, "-", $1); $1 = tolower($1);  print; }' | while read d u
do
	target="$dir/$d"
	test -d "$target" || (
		mkdir -p "$target"
		git clone "$u" "$target"
	)
done

The jq invocation produces lines of output containing owner name and repository name joined by a dash, and I have verified that neither owner nor repo name can contain a slash:

catwell-haricot https://github.com/catwell/haricot.git
dw-py-lmdb https://github.com/dw/py-lmdb.git
...

Some repositories are quite large, so cloning all of this takes time and costs space, but it’s worth it, to me. I will periodically have a program visit each directory and pull changes.

So far GitHub contains most of what I’ve been interested in, but the odd GitLab or Gitea etc. repository I just clone manually.

There’s something missing in order to silence the buzzing bee; I’ll be back with a third installment to my “GitHub trilogy”.

git, GitHub, and repository :: 04 Apr 2019 :: e-mail