Skip to content
@drivendataorg

DrivenData

Data & AI solutions for problems that matter.

DrivenData Logo

Welcome to DrivenData's GitHub Page, a home for open source code in support of data science, machine learning, and AI for social good.

DrivenData runs data science competitions and works directly with mission-driven organizations to tackle real-world challenges in areas like health, education, conservation, disaster response, and more. Our open source repositories contain tools we built and maintain as well as competition-winning models and community-driven solutions available for everyone to use, learn from, and contribute to.


Our mission

DrivenData helps mission-driven organizations harness data to work smarter and deliver greater social impact. We believe in:

  • Open collaboration through accessible machine learning, AI, and data science.
  • Sharing learning and tools from both our work and our competitions to benefit the global data community.
  • Supporting social good, enabling data scientists to solve problems that matter.

Learn more about our work on our website.


Explore our repositories

We host a variety of open source repositories, including tools for data science workflows, purpose-built packages in specific domains, and winning models and approaches from our competitions.

Developer tools

We open source practical tools we use in our own work to support reproducible, responsible, and maintainable software.

  • cookiecutter-data-science: A standardized yet flexible data science project template.
  • cloudpathlib: pathlib-style interfaces for cloud storage.
  • deon: A CLI tool for adding ethics checklists to data science workflows.
  • erdantic: Generate entity relationship diagrams from Python models.

Real-world applications

We collaborate with partner organizations to build and deliver open source applications that address domain-specific social impact challenges.

  • zamba: A deep learning framework for wildlife camera trap image classification.
  • cyfi: A package for detecting harmful algal blooms from satellite imagery.
  • scipeds: A "baked data" library for working with higher education data from IPEDS.

Benchmarked models from DrivenData competitions

We publish winning solutions from past data science competitions under permissive licenses to support learning and reuse. These repositories collect competition submissions spanning topics such as public health, energy forecasting, natural language challenges, and more.


Contributing

Check out our contribution guidelines in individual repositories for details on how to get involved!

Licensing

Projects in this organization are released under licenses stated in their individual repositories. Please check each repository for details on licensing and terms of use.


Thank you for exploring DrivenData’s open source! We look forward to what we can build together.

Pinned Loading

  1. cookiecutter-data-science cookiecutter-data-science Public

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

    Python 9.8k 2.6k

  2. competition-winners competition-winners Public

    The code for the prize winners in DrivenData competitions.

    411 63

  3. cloudpathlib cloudpathlib Public

    Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

    Python 610 81

  4. deon deon Public

    A command line tool to easily add an ethics checklist to your data science projects.

    Python 306 56

  5. erdantic erdantic Public

    Entity relationship diagrams for Python data model classes like Pydantic

    Python 405 28

  6. zamba zamba Public

    A Python package for identifying hundreds of kinds of animals, training custom models, and estimating distance from camera trap videos and images

    Python 152 37

Repositories

Showing 10 of 142 repositories
  • cookiecutter-data-science Public

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

    drivendataorg/cookiecutter-data-science’s past year of commit activity
    Python 9,778 MIT 2,626 21 11 Updated Apr 9, 2026
  • deon Public

    A command line tool to easily add an ethics checklist to your data science projects.

    drivendataorg/deon’s past year of commit activity
    Python 306 MIT 56 17 (4 issues need help) 4 Updated Mar 30, 2026
  • drivendataorg/childrens-speech-recognition-runtime’s past year of commit activity
    Python 5 13 0 1 Updated Mar 27, 2026
  • snomed-ct-benchmark-runtime Public

    Docker runtime for the SNOMED CT Entity Linking Benchmark

    drivendataorg/snomed-ct-benchmark-runtime’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Mar 27, 2026
  • poverty-prediction-challenge Public

    Winning solutions to the Poverty Prediction Challenge

    drivendataorg/poverty-prediction-challenge’s past year of commit activity
    Python 0 MIT 1 0 0 Updated Mar 18, 2026
  • cloudpathlib Public

    Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

    drivendataorg/cloudpathlib’s past year of commit activity
    Python 610 MIT 81 90 (3 issues need help) 14 Updated Mar 17, 2026
  • competition-winners Public

    The code for the prize winners in DrivenData competitions.

    drivendataorg/competition-winners’s past year of commit activity
    411 MIT 63 0 1 Updated Mar 13, 2026
  • frictionless-py Public Forked from frictionlessdata/frictionless-py

    Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data

    drivendataorg/frictionless-py’s past year of commit activity
    Python 0 MIT 160 0 0 Updated Mar 12, 2026
  • childrens-speech-recognition-benchmark-pub Public

    Tutorial code for the On Top of Pasketti: Children’s Speech Recognition Challenge

    drivendataorg/childrens-speech-recognition-benchmark-pub’s past year of commit activity
    Jupyter Notebook 3 MIT 1 0 0 Updated Mar 11, 2026
  • zamba Public

    A Python package for identifying hundreds of kinds of animals, training custom models, and estimating distance from camera trap videos and images

    drivendataorg/zamba’s past year of commit activity
    Python 152 MIT 37 53 11 Updated Feb 17, 2026

Top languages

Loading…

Most used topics

Loading…