Skip to content

Su760/nextwork-rag-api

Repository files navigation

RAG API

Production-ready RAG (Retrieval-Augmented Generation) API built with FastAPI, ChromaDB, and Ollama. Supports querying documents with semantic search and optional LLM-powered answers.

Overview

  • Query: POST documents and query them via semantic search
  • Embed: Ingest documents at runtime via /embed or use the embed.py script for batch ingestion
  • Health: Monitor service status and collection count via /health
  • Mock mode: Run without Ollama using USE_MOCK_LLM=1 for CI and testing

Local Setup

Prerequisites

  • Python 3.11+
  • (Optional) Ollama with tinyllama for production LLM answers

Install

pip install -r requirements.txt

Ingest documents

Place .txt files in the docs/ directory and run:

python embed.py

Documents are split into 500-char chunks with 50-char overlap. A summary of stored chunks is printed.

Run the API

uvicorn app:app --host 0.0.0.0 --port 8000

Without Ollama (mock mode, returns retrieved context as answer):

USE_MOCK_LLM=1 uvicorn app:app --host 0.0.0.0 --port 8000

Example requests

Query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Kubernetes?"}'

Embed at runtime

curl -X POST http://localhost:8000/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document content here.", "doc_id": "my_doc"}'

Health check

curl http://localhost:8000/health

Docker

docker build -t rag-app .
docker run -p 8000:8000 rag-app

The image uses a non-root user and includes a HEALTHCHECK pointing to /health.

Kubernetes

# Build and load image (minikube)
eval $(minikube docker-env)
docker build -t rag-app .

# Deploy
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

# Access (NodePort)
kubectl get svc rag-app-service

Switch service.yaml to type: LoadBalancer for cloud (GKE, EKS, AKS) to get an external IP.

Tests

pip install -r requirements-dev.txt
USE_MOCK_LLM=1 pytest -v

Tests use mock LLM mode and an isolated Chroma database.

Environment Variables

Variable Default Description
MODEL_NAME tinyllama Ollama model for query answers
N_RESULTS 1 Number of chunks to retrieve
DB_PATH ./db ChromaDB persistence path
DOCS_DIR ./docs Directory for embed.py input files
LOG_LEVEL INFO Logging level
USE_MOCK_LLM 0 1 = return context only (no Ollama)
MAX_QUERY_LENGTH 2000 Max query length for validation

About

RAG API with CI/CD pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors