Name	Name	Last commit message	Last commit date
Latest commit History 4 Commits
my_modules	my_modules
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
tagsdump.py	tagsdump.py

Name

Last commit message

Last commit date

4 Commits

Playground - transfer some GitHub metadata to Elastic cloud

Basic description

This program takes a GitHub organization name (or user) and indexes the repository names along with all the repository's respective tags.

Scenario

These components may run in different systems
One or more database tables could handle a reasonable amount of checks performed over a longer period of time

How it works

The main program tagsdump.py has two execution modes:

Initial and first execution. Using parameter --init defines index (e.g.: github-metadata) on Elasticsearch and mapping for the data, e.g.:

$ tagsdump.py --init github-metadata

Next executions. Using the organization name (e.g.: elminster-aom) as source and the index (e.g.: github-metadata) as target, tags information will be stored in Elastic, e.g.:

$ tagsdump.py elminster-aom github_tags

How to install

Clone or download a ZIP of this project, e.g.:

$ git clone git@github.com:elminster-aom/tagsdump.git

Ensure that you have the right version of Python (v3.9, see below)
Create and activate Python Virtual Environment and install required packages, e.g.:

$ python3 -m venv tagsdump \
&& source tagsdump/bin/activate \
&& python3 -m pip install --requirement tagsdump/requirements.txt

Move into the new environment:

$ cd tagsdump

How to set up and run

Create (if doesn't exist already) a GitHub and Elastic account
All available settings are based on an environment variables file in the home of our application. For its creation you can use this template:

$ nano .env

# copy-paste this content:
ELASTIC_CLOUD_ID = my_elastic_cloud_id
ELASTIC_CLOUD_USER = "elastic"
ELASTIC_CLOUD_PSW = my_elastic_password
GITHUB_TOKEN = my_oath_token
LOG_LEVEL = WARNING

$ chmod 0600 .env

Additional considerations

Only Unix-like systems are supported
The code has been tested with Python 3.9.4
For a detailed list of Python modules check out the [requirements.txt]
Concepts like tunning or replication are out of the scope of this exercise

Areas of improvement

Implement Elastic indexing using elasticsearch.helpers.async_streaming_bulk() for a better performance
In addition of tags, include release metadata for adding @timestamp information to Elastic docs
Review the possibility of using Conditional requests for skipping tags already indexed
Index is automatically created actually, maybe it would be worth to control this for taking profit of the index-mapping

About

This program takes a GitHub organization name (or user) and indexes the repository names along with all the repository's respective tags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Playground - transfer some GitHub metadata to Elastic cloud

Basic description

Scenario

How it works

How to install

How to set up and run

Additional considerations

Areas of improvement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Playground - transfer some GitHub metadata to Elastic cloud

Basic description

Scenario

How it works

How to install

How to set up and run

Additional considerations

Areas of improvement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages