I mentioned a bee, and I still hear it buzzing, though it might just the tinnitus I’ve had for a dozen years… Quite annoying.
When I find a repository on GitHub containing something I find interesting or think I’ll use, I will typically star it. Originally I starred to show the owner that I appreciate her or his work, and I still do it for that reason; I think of it as a hat tip. Then it became a method of “bookmarking” a GitHub repository. The only question I never tried to answer was “how do I find those bookmarks?”
The answer is actually quite simple once I dig into the GitHub API, and I’m fortunate in being able to use PyGithub which simplifies things greatly for me.
The following program produces a JSON file with all repositories I’ve starred and details of them:
#!/usr/bin/env python -B
# https://pygithub.readthedocs.io/en/latest/introduction.html
from github import Github # https://github.com/PyGithub/PyGithub
import json
import os
g = Github(open(os.path.expanduser("~/.gist")).read())
starred = []
for repo in g.get_user().get_starred():
# I find stars some of my own repositories, and I don't think
# I actually did that; artefact of prior GitHub practices?
if repo.owner.login != "jpmens":
data = {
"name" : repo.name,
"owner" : repo.owner.login,
"full_name" : repo.full_name,
"clone_url" : repo.clone_url,
}
if repo.owner.name:
data["owner_name"] = repo.owner.name
if repo.owner.email:
data["owner_email"] = repo.owner.email
if repo.owner.avatar_url:
data["owner_avatar_url"] = repo.owner.avatar_url
if repo.description:
data["description"] = repo.description
if repo.homepage:
data["homepage"] = repo.homepage
topics = repo.get_topics() # removing this speeds up the program
if len(topics) != 0:
data["topics"] = topics
starred.append(data)
with open("starred.json", "w") as f:
f.write(json.dumps(starred, indent=4))
The program runs for a couple of minutes because I use an additional API call to obtain more information for my archive: the topics of the particular repository. The result is (I hope) something I can use long-term to find what I’m looking for. Thinking aloud: I have the name, its owner, the owner’s full name (sometimes easier to remember than a cryptic username), the topics, and a description, hoping the repository owner’s taken the trouble to set those.
[
{
"name": "haricot",
"owner": "catwell",
"full_name": "catwell/haricot",
"clone_url": "https://github.com/catwell/haricot.git",
"owner_name": "Pierre Chapuis",
"owner_avatar_url": "https://avatars1.githubusercontent.com/u/221332?v=4",
"description": "Beanstalk client for Lua",
"topics": [
"lua",
"beanstalkd"
]
},
...
]
I can then have a program run through that list and create clones of the clone_url
s:
#!/bin/sh
dir="/path/to/rep/ositories"
jq -r '.[]| "\(.owner)-\(.name) \(.clone_url)"' < starred.json |
awk '{ gsub(/[ \/]/, "-", $1); $1 = tolower($1); print; }' | while read d u
do
target="$dir/$d"
test -d "$target" || (
mkdir -p "$target"
git clone "$u" "$target"
)
done
The jq invocation produces lines of output containing owner name and repository name joined by a dash, and I have verified that neither owner nor repo name can contain a slash:
catwell-haricot https://github.com/catwell/haricot.git
dw-py-lmdb https://github.com/dw/py-lmdb.git
...
Some repositories are quite large, so cloning all of this takes time and costs space, but it’s worth it, to me. I will periodically have a program visit each directory and pull changes.
So far GitHub contains most of what I’ve been interested in, but the odd GitLab or Gitea etc. repository I just clone manually.
There’s something missing in order to silence the buzzing bee; I’ll be back with a third installment to my “GitHub trilogy”.