Runnable, workload-first examples for Serverless AI Jobs and Endpoints on Nebius.
This repository shows how to run real AI/ML workloads without managing VM lifecycle directly.
Examples focus on practical use cases such as:
- model training
- fine-tuning
- batch inference
- LLM serving
- simulations and domain workloads
⚠️ This is a community-style repository maintained by Nebius engineers.
It is not official product documentation, but reflects real usage patterns and experiments.
APIs and behavior may evolve.For official documentation, see: https://docs.nebius.com/serverless
Pick the section that matches your goal — each links to runnable examples:
- 🚀 Quickstarts — lowest-friction first runs.
- 🏋️ Training — model training and fine-tuning workloads.
- ⚡ Inference — endpoint serving and batch inference workloads.
- 🔁 MLOps / Pipelines - orchestration, artifact handoffs, and multi-stage workflows.
- 🧬 Life Science — domain-specific simulation and analysis workloads.
- 🤖 Robotics — simulation, dataset generation, and robotics workflows.
- Install the Nebius CLI: Install guide
- Configure your CLI profile and project: Configure guide
- Pick an example from the sections below
- Follow the example README and verify the expected output
- Optional: shared setup helpers live in
scripts/README.md
- Optional: shared setup helpers live in
Lowest-friction first runs.
first-job.md— runnvidia-smiin a Serverless AI jobfirst-endpoint.md— deploy a quicknginxendpoint
Model training and fine-tuning workloads.
axolotl-finetuning— get started fine-tuning with Axolotltrain-and-serve— fine-tune TinyLlama in a Job and serve it with a vLLM Endpoint
Endpoint serving and batch inference workloads.
vllm-endpoint— serve Qwen with an OpenAI-compatible vLLM endpoint
Workflow orchestration and artifact handoff patterns.
video-transcription-pipeline- orchestrate Object Storage, CPU jobs, and GPU Whisper jobs with Prefect
Domain-specific simulation and analysis workloads.
openmm-simulation— run GPU-backed molecular dynamics simulations with OpenMM
Robotics and physical-AI experiment loops.
lerobot-finetune-job— fine-tune a LeRobot ACT or Diffusion policy on a robotics dataset in a serverless GPU jobsmolva-ft-norma-core— fine-tune SmolVLA for SO-101 with bundled trajectories
External examples and writeups from the community running serverless workloads on Nebius. Got something to add? Open a PR.
- 🤖 Positronic + Nebius serverless workflows — Convert datasets, train ACT/SmolVLA, and serve checkpoints as endpoints — all serverless on Nebius. — by vertix · 💻 code
- 🦾 norma-core SmolVLA — Nebius fine-tune recipe — Upstream recipe the
robotics/smolva-ft-norma-coreexample mirrors. — by norma-core · 💻 code
serverless-cookbook/
├─ README.md
├─ CONTRIBUTING.md
├─ DEVELOPER_GUIDE.md
├─ LICENSE
├─ quickstarts/
├─ training/
├─ robotics/
├─ inference/
├─ mlops/
├─ life-science/
├─ robotics/