Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
255 changes: 255 additions & 0 deletions examples/notebooks/00_hello_flydsl.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a89f7d33",
"metadata": {},
"source": [
"<!-- SPDX-License-Identifier: Apache-2.0 -->\n",
"<!-- Copyright (c) 2025 FlyDSL Project Contributors -->\n",
"\n",
"# Hello, FlyDSL\n",
"\n",
"**FlyDSL** is a Python DSL and MLIR compiler stack for writing high-performance\n",
"AMD GPU kernels. You write ordinary-looking Python; FlyDSL *traces* it into the\n",
"`fly` / `fly_rocdl` MLIR dialects, lowers that through ROCDL/LLVM, and emits a\n",
"HSACO binary that runs on the GPU.\n",
"\n",
"This is notebook **0 of an onboarding series** that builds up the\n",
"`flydsl.expr` foundation one idea at a time:\n",
"\n",
"| # | Notebook | Topic |\n",
"|---|----------|-------|\n",
"| 00 | *this one* | the mental model: `@kernel` / `@jit`, and how to read the IR |\n",
"| 01 | `01_numeric_types` | the scalar type system (ints, floats, bf16, fp8) |\n",
"| 02 | `02_struct` | aggregate value types with `@fx.struct` |\n",
"| 03 | `03_universal_ops` | target-agnostic `Universal*` atoms + a vector-add capstone |\n",
"\n",
"Layout algebra (`make_layout`, `logical_divide`, tiled copy, MMA) is intentionally\n",
"**not** covered yet — it gets its own series once these primitives are familiar.\n",
"\n",
"**Prerequisites:** a built/installed `flydsl`, a ROCm GPU, and `wurlitzer`\n",
"(`pip install wurlitzer`) so the notebook can show GPU `printf` output — see below."
]
},
{
"cell_type": "markdown",
"id": "88791ade",
"metadata": {},
"source": [
"## 1. Setup & sanity check\n",
"\n",
"Two imports cover almost everything:\n",
"\n",
"- `flydsl.compiler as flyc` — the compiler entry points (`@flyc.kernel`, `@flyc.jit`, `flyc.from_dlpack`).\n",
"- `flydsl.expr as fx` — the DSL surface you write *inside* a kernel (types, ops, atoms, `printf`).\n",
"\n",
"If the import below fails, FlyDSL isn't on your path yet — build it and\n",
"`pip install -e .` (see the project README), then restart the kernel."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5bf3709",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"\n",
"import flydsl.compiler as flyc\n",
"import flydsl.expr as fx\n",
"from flydsl.runtime.device import get_rocm_arch\n",
"\n",
"print(\"torch sees GPU:\", torch.cuda.is_available())\n",
"print(\"ROCm arch :\", get_rocm_arch())"
]
},
{
"cell_type": "markdown",
"id": "dbaeeba1",
"metadata": {},
"source": [
"**A note on seeing GPU output.** `fx.printf` runs on the device and writes to the\n",
"process's stdout, which Jupyter does not capture on its own. The tiny helper below\n",
"runs a launcher and routes that output back into the notebook (via `wurlitzer`).\n",
"We'll use `show_gpu_output(...)` throughout the series. Outside Jupyter — running a\n",
"plain `.py` — you don't need any of this; `printf` just goes to your terminal."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd67645c",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"from wurlitzer import pipes\n",
"\n",
"\n",
"def show_gpu_output(launcher, *args, **kwargs):\n",
" \"\"\"Run a @flyc.jit launcher and echo its GPU printf output into the notebook.\"\"\"\n",
" kwargs.setdefault(\"stream\", torch.cuda.Stream())\n",
" with pipes() as (out, _err):\n",
" launcher(*args, **kwargs)\n",
" torch.cuda.synchronize()\n",
" print(out.read(), end=\"\")"
]
},
{
"cell_type": "markdown",
"id": "3d3a00c3",
"metadata": {},
"source": [
"## 2. Two decorators: `@flyc.kernel` and `@flyc.jit`\n",
"\n",
"FlyDSL splits a launch into two traced functions:\n",
"\n",
"- **`@flyc.kernel`** marks **device** code — the body that runs on each GPU thread.\n",
" Inside it you have intrinsics like `fx.thread_idx.x` and `fx.block_idx.x`.\n",
"- **`@flyc.jit`** marks the **host launcher**. It calls a kernel and `.launch(...)`es\n",
" it with a grid/block configuration.\n",
"\n",
"Both are *traced*, not interpreted: when first called, FlyDSL runs the Python once\n",
"to build MLIR, compiles it, and caches the result. So `block=(4, 1, 1)` is read at\n",
"**trace time**, while `fx.thread_idx.x` is a **runtime** value that differs per\n",
"thread.\n",
"\n",
"Here is the smallest possible kernel — it takes no tensors and just prints.\n",
"`fx.printf` uses `{}` placeholders (avoid a literal `%` in the format string — it\n",
"is consumed by the underlying device `printf`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6596d590",
"metadata": {},
"outputs": [],
"source": [
"@flyc.kernel\n",
"def hello_kernel():\n",
" bid = fx.block_idx.x\n",
" tid = fx.thread_idx.x\n",
" fx.printf(\"hello from block {} thread {}\", bid, tid)\n",
"\n",
"\n",
"@flyc.jit\n",
"def hello(stream: fx.Stream = fx.Stream(None)):\n",
" hello_kernel().launch(grid=(1, 1, 1), block=(4, 1, 1), stream=stream)\n",
"\n",
"\n",
"show_gpu_output(hello)"
]
},
{
"cell_type": "markdown",
"id": "f078729c",
"metadata": {},
"source": [
"Four threads, four lines. The launch built a one-block grid of four threads, and\n",
"each thread reached the `printf`."
]
},
{
"cell_type": "markdown",
"id": "f5547c20",
"metadata": {},
"source": [
"## 3. Looking at the generated IR\n",
"\n",
"The fastest way to build intuition for what FlyDSL *did* is to read the MLIR it\n",
"produced. Set `FLYDSL_DUMP_IR=1` (and a dump directory) and FlyDSL writes one\n",
"`.mlir` file per compiler pass, from `00_origin.mlir` (the high-level `fly` IR)\n",
"down to the final ISA. The env var is read at **compile time**, so we set it, then\n",
"compile a fresh kernel and read back its first dump."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0c233269",
"metadata": {},
"outputs": [],
"source": [
"import contextlib\n",
"import glob\n",
"import io\n",
"import os\n",
"import tempfile\n",
"\n",
"dump_dir = tempfile.mkdtemp(prefix=\"flydsl_ir_\")\n",
"os.environ[\"FLYDSL_DUMP_IR\"] = \"1\"\n",
"os.environ[\"FLYDSL_DUMP_DIR\"] = dump_dir\n",
Comment on lines +183 to +185
"\n",
"\n",
"@flyc.kernel\n",
"def add_one_kernel(x: fx.Int32):\n",
" fx.printf(\"x + 1 = {}\", x + fx.Int32(1))\n",
"\n",
"\n",
"@flyc.jit\n",
"def add_one(x: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n",
" add_one_kernel(x).launch(grid=(1, 1, 1), block=(1, 1, 1), stream=stream)\n",
"\n",
"\n",
"# Compile + run once (silence the verbose per-pass dump log).\n",
"with contextlib.redirect_stdout(io.StringIO()):\n",
" add_one(fx.Int32(41), stream=torch.cuda.Stream())\n",
" torch.cuda.synchronize()\n",
"\n",
"os.environ.pop(\"FLYDSL_DUMP_IR\", None) # stop dumping...\n",
"os.environ.pop(\"FLYDSL_DUMP_DIR\", None) # ...and clear the dump dir we set\n",
"\n",
"origin = sorted(glob.glob(os.path.join(dump_dir, \"*\", \"00_origin.mlir\")))[0]\n",
"with open(origin) as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"id": "5c0b1b6e",
"metadata": {},
"source": [
"Things to notice in that high-level `fly` IR:\n",
"\n",
"- `gpu.func @add_one_kernel_0(...) kernel { ... }` — the device kernel.\n",
"- `arith.addi %arg0, %c1_i32` — the `x + 1` you wrote, in MLIR form.\n",
"- `fly.print(...) {format = \"...\"}` — your `fx.printf`.\n",
"- `gpu.launch_func ... blocks in (...) threads in (...)` — the host-side launch.\n",
"\n",
"The numbered files after `00_origin.mlir` show each lowering step (layout\n",
"lowering, `fly`→`rocdl`, `gpu`→`llvm`, …) down to `*_final_isa.s`. Whenever a\n",
"kernel misbehaves, dumping the IR is the first move.\n",
"\n",
"---\n",
"**Next:** [`01_numeric_types`](01_numeric_types.ipynb) — the scalar type system you\n",
"just used (`fx.Int32`) in full: integers, floats, `bf16`, `fp8`, casts, and the\n",
"difference between compile-time and runtime values."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading