The Nano-Llama Engine (MicroLlama-Scratch)

A 110K Parameter Autoregressive Character-Level Language Model built from absolute scratch.

Welcome to the Nano-Llama Engine. This repository is not a wrapper around an existing model. It is a complete, deep-dive architectural recreation of a Generative Pre-trained Transformer (GPT), built manually from the ground up to understand the underlying calculus and matrix mathematics of Large Language Models.

The Architecture

This model implements the core mechanics of modern LLMs (like Llama 3 and GPT-4) at a microscopic scale:

Rotary Positional Embeddings (RoPE)
SwiGLU Activation Functions
Multi-Head Causal Self-Attention
RMSNorm (Root Mean Square Normalization)
KV-Caching & Autoregressive Inference

The 6 Volumes of Progression

This repository is structured educationally into 6 distinct volumes, showing the evolution from raw math to a productionized API.

Volume 1: NumPy Math

The fundamental linear algebra and multivariate calculus. We build the Transformer block (Self-Attention, SwiGLU, RMSNorm) using pure NumPy. The primary goal was to manually derive the backpropagation and gradient flow for complex mechanisms without relying on automated differentiation.

Volume 2: PyTorch Automaton

Scaling the architecture with GPU acceleration. We take the mathematical intuition proven in Volume 1 and translate the exact same architecture into PyTorch's nn.Module. Here we introduce the Adam optimizer, batched data processing, and GPU tensors.

Volume 3: The Inference Engine

Giving the Automaton a voice. We write the generation loop. We transition the model from "training mode" into a fully autonomous text generator, handling token decoding and context window sliding so the model can generate text autoregressively.

Volume 4: The Shakespeare Scale

Training the brain. We build a dynamic character-level vocabulary and a persistent training loop. The model is trained on a 1MB dataset of William Shakespeare's works, successfully learning to spell English words and understand basic grammar entirely from scratch.

Volume 5: The Showcase Interface

Visualizing the neural network. A custom Flask API and glassmorphic Web UI. Instead of just printing text to a terminal, this interface dynamically graphs the Softmax probabilities of the neural network's thought process in real-time as it generates text.

Volume 6: The Production API

Containerizing the engine. We wrap the neural network in a production-ready FastAPI server and containerize it using Docker. This demonstrates the ability to transition raw research math into a scalable, deployable cloud architecture.

How to Run the Showcase (Volume 5)

If you want to see the Neural Network calculate probabilities live in your browser:

Clone this repository.
Install the requirements:
```
pip install torch flask
```
Run the Flask Server:
```
python volume_5_showcase/app.py
```
Open your web browser and navigate to http://127.0.0.1:5000.

(Note: The trained shakespeare_gpt.pth weights are not included in this repository due to GitHub file size limits. To use the UI, you must first run python volume_4_shakespeare_scale/training_loop.py to train your own local weights!)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
volume_1_numpy_math		volume_1_numpy_math
volume_2_pytorch_automaton		volume_2_pytorch_automaton
volume_3_inference_engine		volume_3_inference_engine
volume_4_shakespeare_scale		volume_4_shakespeare_scale
volume_5_showcase		volume_5_showcase
volume_6_production_api		volume_6_production_api
volume_7_lora_finetuning		volume_7_lora_finetuning
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Nano-Llama Engine (MicroLlama-Scratch)

The Architecture

The 6 Volumes of Progression

Volume 1: NumPy Math

Volume 2: PyTorch Automaton

Volume 3: The Inference Engine

Volume 4: The Shakespeare Scale

Volume 5: The Showcase Interface

Volume 6: The Production API

How to Run the Showcase (Volume 5)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Nano-Llama Engine (MicroLlama-Scratch)

The Architecture

The 6 Volumes of Progression

How to Run the Showcase (Volume 5)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages