Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions CONDA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Conda Requirement for Exercises

## Why Conda is Required

The exercises in this repository use **MLflow** to orchestrate ML pipelines. MLflow relies on **conda** to create isolated, reproducible environments for each pipeline step.

Each exercise contains a `conda.yml` file that specifies the dependencies for that component. When you run:

```bash
mlflow run .
```

MLflow reads the `conda.yml` file and automatically creates a conda environment with the specified packages before executing the code.

## Installing Conda

If you don't have conda installed, you can install it via:

- **Miniconda** (recommended, lightweight): https://docs.conda.io/en/latest/miniconda.html
- **Anaconda** (full distribution): https://www.anaconda.com/download

## Virtual Environment (requirements.txt)

The `requirements.txt` file in the root directory contains all dependencies with pinned versions for the entire repository. This is useful for:

- Exploring the codebase
- Running standalone Python scripts
- IDE/editor support and autocompletion

To use it:

```bash
python3.13 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

However, **for running the actual exercises**, you still need conda installed because MLflow will create its own conda environments from the individual `conda.yml` files.

## Exercise Workflow

1. Install conda (Miniconda or Anaconda)
2. Navigate to an exercise directory (e.g., `lesson-1-machine-learning-pipelines/exercises/exercise_3/starter/`)
3. Run the pipeline: `mlflow run .`
4. MLflow will automatically create the conda environment and execute the code
45 changes: 44 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,46 @@
# Core Data Science
pandas==2.3.2
numpy==2.2.6
scipy==1.16.1

# Machine Learning
scikit-learn==1.7.2
mlflow==3.2.0

# Deep Learning (PyTorch)
torch==2.8.0
torchvision==0.23.0

# Visualization
matplotlib==3.10.0
seaborn==0.13.2
plotly==6.3.0

# Workflow & Configuration Management
hydra-core==1.3.2
hydra-joblib-launcher==1.2.0
omegaconf==2.3.0

# Experiment Tracking
wandb==0.21.3
jupyter==1.1.1

# Data Profiling
ydata-profiling==4.18.1

# Web & HTTP
requests==2.32.5

# Image Processing
pillow==11.3.0

# Testing
pytest==8.4.2

# Jupyter & Notebooks
jupyter==1.1.1
jupyter_core==5.8.1
ipywidgets==8.1.7

# Data Serialization
pyarrow==21.0.0
pyyaml==6.0.3