Skip to content
View Pawansingh3889's full-sized avatar

Highlights

  • Pro

Block or report Pawansingh3889

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pawansingh3889/README.md

Pawan Singh Kapkoti

Data & Analytics Engineer. I build robust data pipelines, read source code, and ship fixes upstream.

MSc Data Analytics. Building pipelines and dev tools on the side. I believe compliance shouldn't mean spreadsheets and AI shouldn't require the cloud. Yorkshire, UK.

Portfolio


Projects

OpsMind — On-prem AI query tool for manufacturing. docs

  • Ask production questions in plain English, get SQL results in 5 seconds
  • LangGraph multi-step agent (6-node state graph) with 5-stage SQL validation
  • MCP server architecture: database + doc search as decoupled tool servers
  • pgvector + ChromaDB retrieval, runtime-loaded domain docs
  • Gemma 3 12B via Ollama — no data leaves the factory
  • 7 business domains, formal agent specs, ty type checker in CI
  • Docker deployment with isolated Ollama container, structured JSONL audit logging
  • Golden-set eval harness (library + LLM paths) with failure-mode taxonomy
  • Governance, security policy, and code of conduct published — first-PR-wins assignment

Production Analytics Pipeline — Incremental ETL from fish production ERP

  • 15K+ rows daily from 4 ERP tables, validated with Pydantic
  • FastAPI REST API (11 endpoints) + Next.js dashboard + Power BI export
  • Prefect orchestration, Sentry monitoring, Docker + OpenTofu deployment
  • Batch tracking, yield analysis, shelf life management, traceability | 53 tests
  • Apache 2.0 licensed; governance, security, and code-of-conduct documents published

UK Crime Pipeline — Police UK API to PostgreSQL and BigQuery. streamlit / looker studio / hugging face

  • 99,675 records, 10 cities, 6 dbt marts (including outcome analysis and YoY trends), 65 tests
  • Declarative data validation + SLO monitoring (freshness, completeness, volume)
  • Polars-based alternative ingestion, pipeline maturity scorecard
  • 3 CI/CD workflows with ty type checker, diskcache + stamina for API resilience
  • Apache 2.0 licensed; NOTICE documents the OGL-v3.0 chain on derived datasets

sql-sop — SQL linter on PyPI. pip install sql-sop

  • 23 rules (10 errors, 13 warnings) covering DELETE/UPDATE-without-WHERE, implicit cross joins, nested subqueries, unused CTEs, SELECT *, and more
  • 78 tests, sqlparse AST parsing, fluent API (SqlGuard().enable(...).scan(...))
  • libCST-based Python scanner catches SQL injection in .execute() / .read_sql() calls (v0.4.0)
  • Pre-commit hook + GitHub Action for CI/CD integration, 195+ monthly downloads
  • MIT licensed (deliberately kept — PyPI downstream stability); full governance + security policy published

Open source

I learn tools by reading their source. I reverse-engineered the drt connector architecture, shipped 5 destination connectors, and wrote the official connector tutorial — all merged. Same approach everywhere: read the internals, find the gap, ship the fix.

drt · pandas · ChromaDB · pgcli · ollama · superset · plotly · fpdf2


Stack

Python, SQL, dbt, PostgreSQL, BigQuery, FastAPI, Streamlit, Prefect, LangGraph, Ollama, Docker, Polars, pandas, Pydantic, pytest, GitHub Actions

Pinned Loading

  1. uk-crime-pipeline uk-crime-pipeline Public

    End-to-end pipeline: Police UK API to PostgreSQL + BigQuery. dbt staging/marts, 65 tests, 3 CI/CD workflows, Looker Studio + Streamlit dashboards.

    Python

  2. OpsMind OpsMind Public

    On-prem AI query tool for manufacturing. NL-to-SQL in 5 seconds. LangGraph agent, pgvector + ChromaDB RAG, Gemma 3 12B via Ollama. 19 tables, read-only.

    Python 1

  3. uk-education-attainment uk-education-attainment Public

    ML analysis of UK A-Level attainment gaps by ethnicity, gender & deprivation using DfE data

    Jupyter Notebook

  4. drt drt Public

    Forked from drt-hub/drt

    Reverse ETL for the code-first data stack

    Python

  5. forthepeople-uk forthepeople-uk Public

    UK citizen transparency platform. Free council-level dashboards: weather, population, housing, crime, health, schools, elections, benefits.

    Python

  6. Hackathon-mediask Hackathon-mediask Public

    MediAsk — health Q&A platform for factory workers. Flask, PostgreSQL, Gemini AI, Docker. Live on Render.

    Python