This program takes a GitHub organization name (or user) and indexes the repository names along with all the repository's respective tags.
- These components may run in different systems
- One or more database tables could handle a reasonable amount of checks performed over a longer period of time
The main program tagsdump.py has two execution modes:
- Initial and first execution. Using parameter
--initdefines index (e.g.: github-metadata) on Elasticsearch and mapping for the data, e.g.:
$ tagsdump.py --init github-metadata- Next executions. Using the organization name (e.g.: elminster-aom) as source and the index (e.g.: github-metadata) as target, tags information will be stored in Elastic, e.g.:
$ tagsdump.py elminster-aom github_tags- Clone or download a ZIP of this project, e.g.:
$ git clone git@github.com:elminster-aom/tagsdump.git- Ensure that you have the right version of Python (v3.9, see below)
- Create and activate Python Virtual Environment and install required packages, e.g.:
$ python3 -m venv tagsdump \
&& source tagsdump/bin/activate \
&& python3 -m pip install --requirement tagsdump/requirements.txt- Move into the new environment:
$ cd tagsdump- Create (if doesn't exist already) a GitHub and Elastic account
- All available settings are based on an environment variables file in the home of our application. For its creation you can use this template:
$ nano .env
# copy-paste this content:
ELASTIC_CLOUD_ID = my_elastic_cloud_id
ELASTIC_CLOUD_USER = "elastic"
ELASTIC_CLOUD_PSW = my_elastic_password
GITHUB_TOKEN = my_oath_token
LOG_LEVEL = WARNING
$ chmod 0600 .env- Only Unix-like systems are supported
- The code has been tested with Python 3.9.4
- For a detailed list of Python modules check out the [requirements.txt]
- Concepts like tunning or replication are out of the scope of this exercise
- Implement Elastic indexing using
elasticsearch.helpers.async_streaming_bulk()for a better performance - In addition of tags, include release metadata for adding
@timestampinformation to Elastic docs - Review the possibility of using Conditional requests for skipping tags already indexed
- Index is automatically created actually, maybe it would be worth to control this for taking profit of the index-mapping