RAG API

Production-ready RAG (Retrieval-Augmented Generation) API built with FastAPI, ChromaDB, and Ollama. Supports querying documents with semantic search and optional LLM-powered answers.

Overview

Query: POST documents and query them via semantic search
Embed: Ingest documents at runtime via /embed or use the embed.py script for batch ingestion
Health: Monitor service status and collection count via /health
Mock mode: Run without Ollama using USE_MOCK_LLM=1 for CI and testing

Local Setup

Prerequisites

Python 3.11+
(Optional) Ollama with tinyllama for production LLM answers

Install

pip install -r requirements.txt

Ingest documents

Place .txt files in the docs/ directory and run:

python embed.py

Documents are split into 500-char chunks with 50-char overlap. A summary of stored chunks is printed.

Run the API

uvicorn app:app --host 0.0.0.0 --port 8000

Without Ollama (mock mode, returns retrieved context as answer):

USE_MOCK_LLM=1 uvicorn app:app --host 0.0.0.0 --port 8000

Example requests

Query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Kubernetes?"}'

Embed at runtime

curl -X POST http://localhost:8000/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document content here.", "doc_id": "my_doc"}'

Health check

curl http://localhost:8000/health

Docker

docker build -t rag-app .
docker run -p 8000:8000 rag-app

The image uses a non-root user and includes a HEALTHCHECK pointing to /health.

Kubernetes

# Build and load image (minikube)
eval $(minikube docker-env)
docker build -t rag-app .

# Deploy
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

# Access (NodePort)
kubectl get svc rag-app-service

Switch service.yaml to type: LoadBalancer for cloud (GKE, EKS, AKS) to get an external IP.

Tests

pip install -r requirements-dev.txt
USE_MOCK_LLM=1 pytest -v

Tests use mock LLM mode and an isolated Chroma database.

Environment Variables

Variable	Default	Description
`MODEL_NAME`	tinyllama	Ollama model for query answers
`N_RESULTS`	1	Number of chunks to retrieve
`DB_PATH`	./db	ChromaDB persistence path
`DOCS_DIR`	./docs	Directory for embed.py input files
`LOG_LEVEL`	INFO	Logging level
`USE_MOCK_LLM`	0	1 = return context only (no Ollama)
`MAX_QUERY_LENGTH`	2000	Max query length for validation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
db		db
docs		docs
tests		tests
venv		venv
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
deployment.yaml		deployment.yaml
embed.py		embed.py
k8s.txt		k8s.txt
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
semantic_test.py		semantic_test.py
service.yaml		service.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG API

Overview

Local Setup

Prerequisites

Install

Ingest documents

Run the API

Example requests

Docker

Kubernetes

Tests

Environment Variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG API

Overview

Local Setup

Prerequisites

Install

Ingest documents

Run the API

Example requests

Docker

Kubernetes

Tests

Environment Variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages