Skip to content

docs: migrate documentation site from Sphinx to MkDocs#1586

Draft
timsaucer wants to merge 18 commits into
apache:mainfrom
timsaucer:doc/switch-mkdocs
Draft

docs: migrate documentation site from Sphinx to MkDocs#1586
timsaucer wants to merge 18 commits into
apache:mainfrom
timsaucer:doc/switch-mkdocs

Conversation

@timsaucer

Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

The documentation site previously used Sphinx + MyST with a mix of reStructuredText and Markdown sources. This PR migrates the full site to MkDocs with the Material theme and mkdocstrings for API reference generation. Motivations:

  • Markdown is the format new contributors already know. RST directives (.. autoclass::, :ref:) are a friction point.
  • LLM-based contributors and reviewers parse Markdown reliably; RST is hit-or-miss.
  • Material for MkDocs ships a stronger default UX out of the box (search, navigation, dark mode, code copy) without piecing together Sphinx extensions.
  • mkdocs serve provides a faster, more reliable live-reload loop than sphinx-autobuild for this project's content set.

What changes are included in this PR?

  • Documentation sources converted from reStructuredText (and MyST) to plain Markdown, with the navigation tree expressed in mkdocs.yml.
  • API reference generation moved from Sphinx autodoc to mkdocstrings with Google-style docstrings.
  • User guide pages updated to use markdown-exec for executable code blocks (replacing the previous Jupyter-based pipeline), with shared setup centralized in a small hook.
  • Links, anchors, and admonitions swept for consistency; broken cross-references and duplicate TOC entries fixed.
  • Public API surface tightened so the generated reference matches what users are expected to import.
  • Cross-references inside Python docstrings use sphinx-style roles (:func:, :class:, :meth:, :attr:) so that IDE hovers render clickable links. A small griffe extension rewrites these roles into mkdocstrings autorefs at build time, so the published site continues to link correctly.
  • A test asserts that all public modules surface on the documentation site, guarding against future drift.

Are there any user-facing changes?

The published documentation site has a new look and navigation structure but the same hosting URL. There are no API or behavior changes to the datafusion Python package itself. Contributors writing new pages or docstrings should follow the MkDocs/Markdown conventions described in CLAUDE.md and the user guide.

timsaucer and others added 18 commits June 7, 2026 15:31
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:

- 33 .rst files become 33 .md files (user guide, contributor guide,
  index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
  toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
  blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
  but every `{ref}` target in the corpus still uses the underscore
  form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
  Python docstrings that AutoAPI pulls in. Rewrite the anchors back
  to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
  directives, which have no first-class MyST equivalent. They render
  identically and don't block the build.

conf.py changes:

- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
  emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
  remains: sphinx-autoapi generates RST under autoapi/ at build time
  and Sphinx needs the suffix registered to parse it.

AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.

Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RST-to-MD conversion emitted MyST `%` comment syntax with blank line
between each header line, which renders as visible text. Replace with
canonical `<!--- ... -->` HTML comment block matching upstream
apache/datafusion and this repo's existing markdown files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the Sphinx + MyST + sphinx-autoapi + pydata-sphinx-theme stack
with MkDocs + Material + mkdocstrings + mkdocs-jupyter. The MyST
conversion from apache#1579 forms the base; this commit removes the remaining
RST/MyST artifacts and rebuilds the docs around MkDocs.

Build tooling

* Replace `docs/source/conf.py` and the Sphinx Makefile with a root-level
  `mkdocs.yml` and a thin Makefile wrapper that runs `mkdocs build`.
* Swap the `[dependency-groups] docs` deps in `pyproject.toml`: out go
  sphinx / sphinx-autoapi / sphinx-reredirects / myst-parser /
  pydata-sphinx-theme; in come mkdocs<2, mkdocs-material, mkdocstrings,
  mkdocs-jupyter, mkdocs-redirects. The `mkdocs<2` cap reflects upstream
  uncertainty around the announced MkDocs 2.0 rewrite.
* Update the `build-docs` job in `.github/workflows/build.yml` to run
  `mkdocs build` and stage data files (pokemon.csv, taxi parquet) at
  `docs/source/` so notebooks can resolve relative paths during
  execution. The asf-staging / asf-site publish logic is preserved
  byte-for-byte.

API reference

* Replace sphinx-autoapi with one mkdocstrings page per top-level class
  or module under `docs/source/reference/`. The deprecated members
  previously hidden via `autoapi_skip_member_fn` are reproduced via
  mkdocstrings `filters:` / `members:` options.
* Add `dev/check_api_coverage.py`, a CI guard that fails if any
  `datafusion.__all__` entry lacks a heading in `docs/source/reference/`.

User-guide notebooks

* Convert the 14 `.md` pages that contained `{eval-rst}` `.. ipython::`
  blocks into executable Jupyter notebooks rendered by `mkdocs-jupyter`.
  Each notebook starts with a small setup cell that locates the
  docs root (so `pokemon.csv` etc. resolve regardless of nesting depth)
  and imports the common `SessionContext` / `col` / `lit` / `functions`
  symbols.
* Convert two remaining `{eval-rst}` `.. list-table::` blocks
  (distributing-work, execution-metrics) into Markdown tables.
* Rewrite MyST `:::{note}` / `:::{warning}` / `:::{tip}` admonitions to
  Material's `!!! note` syntax.

Cross-reference rewrite

* Rewrite Sphinx/MyST roles to Markdown links across user-guide pages,
  notebooks, and `python/datafusion/*.py` docstrings via
  `dev/rewrite_doc_roles.py`. Patterns covered:
  `:py:class:` / `:py:func:` / `:py:meth:` / `:py:mod:` /
  `{py:class}` / `{py:func}` / `{py:meth}` / `{py:mod}` /
  `{code}` / `{doc}` / `{ref}`, plus stray `(label)=` anchors.
* Add `inventories:` for python and pyarrow under the mkdocstrings
  python handler so links to stdlib and Arrow symbols resolve.
* Collapse intra-module and over-long markdown links to inline code in
  docstrings to keep lines within the 88-character ruff limit.

Theme

* Port the prior theme to a Material `custom_dir` overrides directory:
  Apache trademark footer (`_overrides/partials/copyright.html`) and
  a slimmed `theme_overrides.css` that keeps the `#D74633` accent
  applied to links and inline code while leaving the header palette
  white (light) / black (dark) to match `datafusion-comet`.

The result builds with all 14 notebooks executing successfully; remaining
cross-reference warnings are doc-quality follow-ups, not migration
blockers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tag the `nb-setup` cell in every notebook with `remove-input` and
`remove-output`, configure mkdocs-jupyter's `remove_tag_config` to honor
both tags, and hide the residual `<div>` container via a
`.celltag_nb-setup { display: none }` rule in `theme_overrides.css`.

The setup cell still executes — only the visible representation is
stripped. End users no longer see the `os.chdir(...)` boilerplate or
the convenience imports at the top of every user-guide page.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* `theme_overrides.css`: force `white-space: pre` + `overflow-x: auto`
  on `jp-OutputArea-output` and `jp-RenderedText` (and their `pre`
  children). Material's default `pre-wrap` rule was breaking wide
  `df.show()` ASCII tables mid-row; with these overrides they scroll
  horizontally inside the output box.
* Introduction page: replace the static `jupyter_lab_df_view.png`
  screenshot with a live `display(df)` code cell so mkdocs-jupyter
  captures DataFrame's `_repr_html_` output (styled, expandable) at
  build time. Delete the unreferenced PNG and now-empty
  `docs/source/images/` directory.
* `mkdocs.yml`: rename the nav entry for `basics.ipynb` from "Basics"
  to "Concepts" so the sidebar label matches the page H1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* `dev/rewrite_doc_roles.py`: extend the role regex to handle bare
  `:class:`, `:func:`, `:meth:`, `:mod:`, `:attr:`, `:obj:`, `:data:`,
  `:exc:` (no `py:` prefix) and the matching `{class}` / `{func}` /
  `{meth}` / ... MyST variants. Also tolerate the `~.foo` leading-dot
  syntax. Re-run the script: 9 files updated, 0 sphinx-role artifacts
  left in `docs/source/` or `python/datafusion/`.
* Convert all `:ref:` invocations (e.g. `:ref:`user_guide_concepts``)
  to direct Markdown links pointing at the user guide URLs.
* Rename `docs/source/user-guide/basics.ipynb` -> `concepts.ipynb` so
  the URL slug matches the page H1. Update `mkdocs.yml` nav.
* Expand short autoref targets: `[X][datafusion.X.method]` ->
  `[X][datafusion.module.X.method]` across user-guide pages and
  `python/datafusion/*.py` docstrings (DataFrame, SessionContext,
  ExecutionPlan, Expr, RecordBatch, Catalog, ScalarUDF/AggregateUDF
  /WindowUDF/TableFunction, etc.).
* Bare leaf-name autorefs (`[X][SessionContext]` etc.) also expanded.
* Bare function-name autorefs (`[X][col]`, `[X][rank]`, ...) mapped to
  `datafusion.functions.X` based on `datafusion.functions` exports.
* Auto-link plain-code mentions in `concepts.ipynb` and
  `user-guide/dataframe/index.md` so prose references to `DataFrame`,
  `read_csv`, `SessionContext`, `col`, `lit`, etc. become real links.
* Add `CaseBuilder` and `GroupingSet` to `reference/expr.md` so their
  cross-refs resolve.
* `docs/hooks.py`: type the MkDocs hook signature, hoist the parts-count
  magic literal into a named constant. Add `INP001` to the `docs/*` ruff
  per-file-ignore so the hook file doesn't need an `__init__.py`.
* Shorten the few overlong `[user_guide_concepts](/python/...)` style
  links in `python/datafusion/*.py` module docstrings — drop the link,
  keep the prose — so the lines come back under the 88-char limit.

Build is green and cross-reference warnings are down from 253 to 101
(remaining are bare lowercase method names like `union`/`sort_by` whose
target class is ambiguous, plus a handful of `pa.RecordBatch` PyArrow
inventory alias misses).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Replace stale Sphinx `{toctree}` blocks in every section index with
  proper Markdown lists or tables, and drop the hidden root toctree
  from `index.ipynb` (the sidebar nav covers it).
* `user-guide/io/index.md`: rewrite as a per-format table linking to
  each reader page and the corresponding `SessionContext.read_*` method.
* Rename `basics.ipynb` -> `concepts.ipynb` to match the page H1.
* Fix on-page `read_csv` mention that pointed at `datafusion.io.read_csv`;
  it now targets `SessionContext.read_csv` since the page calls it as
  `ctx.read_csv`.
* Auto-link plain-code mentions of `SessionContext` methods
  (`from_pydict`, `from_pylist`, `from_arrow`, `from_polars`,
  `create_dataframe`, `register_*`, ...) across user-guide pages and
  notebooks.
* Resolve bare-anchor links (`[Parquet](io_parquet)` and `:ref:`
  artifacts) to real relative paths from each source file's location.
* Rewrite `[X][datafusion.col]` / `[X][datafusion.column]` to canonical
  anchors `datafusion.col.col` / `datafusion.col.column`.
* Link the second `deltalake` mention on the data-sources page.
* `reference/catalog.md`: add the 8 catalog/schema base classes that
  were referenced by user-guide but missing from the reference tree.
* New `reference/formatter.md` covering the full
  `datafusion.dataframe_formatter` module. Update `dataframe.md` to
  cross-link instead of duplicating `configure_formatter`.
* `user-guide/dataframe/rendering.md`: link formatter symbols inline;
  remove the unrelated "Additional Resources" section.
* Notebook admonitions: rewrite `!!! note` / `!!! warning` / `!!! tip`
  in notebook markdown cells to `<div class="admonition KIND">` HTML
  so mistune passes them through and Material's CSS styles them.
* Reorder Common Operations nav so Basic Info comes before Views.
* Notebook setup cell now calls
  `configure_formatter(max_rows=10, show_truncation_message=False)` so
  rendered DataFrame output stays compact and free of "Data truncated"
  banners.
* Collapse remaining long markdown links inside docstrings to inline
  code so the 88-char ruff limit holds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User-guide

* Drop the obsolete `grouping().alias()` workaround in
  `common-operations/aggregations.ipynb` — verified that
  `apache/datafusion#21411` no longer reproduces against the current
  pinned DataFusion. Both the admonition note and the
  `with_column_renamed` post-hoc cleanup loops in cells 24 and 32 are
  gone; the examples now alias `grouping()` directly. Same `..
  warning::` block stripped from the `grouping()` docstring in
  `python/datafusion/functions.py`.
* `distributing-work.md`:
  - Point `datafusion-distributed` at the active
    `datafusion-contrib/datafusion-distributed` repo (the previous
    `apache/datafusion-distributed` 404s).
  - Point Ballista at `https://datafusion.apache.org/ballista/`
    instead of the GitHub source.
  - Shorten the "Implicit; access via" link label from
    `SessionContext.global_ctx` to just `global_ctx` so the
    session-slot reference table is easier to scan.
  - Replace the two file-level example links
    (`multiprocessing_pickle_expr.py` / `ray_pickle_expr.py`) with a
    single link to the `examples/` folder so the docs don't drift
    when specific scripts get renamed.
* `sql.ipynb`: fix the broken `[configuration options](configuration)`
  bare anchor — now `../configuration/`.
* `upgrade-guides.md`: split the `## DataFusion 54.0.0` section into
  two `###` subsections so the `Config` -> `SessionConfig` and the
  `distinct` argument on `sum`/`avg` changes are independently
  scannable.
* Rename the Common Operations nav entry from "UDF and UDAF" to
  "User-Defined Functions" (the page also covers UDWF and UDTF), and
  expand the index list bullet to enumerate all four flavors.

Contributor guide

* `introduction.md`: fix the `[PyO3 class mutability guidelines]`
  bare anchor — now `ffi.md#pyo3-class-mutability-guidelines`. Also
  correct `maturin develop -uv` -> `maturin develop --uv`.
* `ffi.md`:
  - Strip `{file}` MyST roles that mistune passed through as literal
    text. `dev/rewrite_doc_roles.py` extended to handle `{file}`,
    `{samp}`, and `{kbd}` for future passes.
  - Strip the stray double-backtick `` `` `datafusion-python` `` ``
    wrapping that rendered as literal extra backticks.
  - Fix the `[Data Sources](user_guide_data_sources)` bare anchor —
    now `../user-guide/data-sources.ipynb`.
  - Update the abi_stable mention: the FFI implementation switched
    to `stabby` in DataFusion 54.0.0; the page now points at
    `stabby` with a parenthetical note about `abi_stable`.
  - Refresh the `FFI_TableProvider` usage example to match
    `examples/datafusion-ffi-example/src/table_provider.rs`:
    `FFI_TableProvider::new_with_ffi_codec(...)` with a logical codec
    pulled out of the calling session via
    `ffi_logical_codec_from_pycapsule`. Bump the docs.rs link from
    `datafusion/45.0.0` to `datafusion/latest`.
  - Update the receiver-side snippet: `ffi_provider.into()` now
    produces an `Arc<dyn TableProvider>` directly, mirroring
    `crates/util/src/lib.rs::table_provider_from_pycapsule`. The
    `ForeignTableProvider` wrapper is no longer needed and is
    mentioned only as a historical parenthetical. PyCapsule
    construction modernized to the `cr"..."` literal +
    `PyCapsule::new` form used by the current example file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Notebook (.ipynb) files were JSON-encoded and hostile to manual edits
and diffs. Convert every notebook page to plain Markdown that uses
markdown-exec fences, so authors edit straight `.md` files and reviewers
see real diffs.

Each former code cell becomes one of:

  ```python exec="1" source="material-block" result="text" session="<slug>"
  ...code...
  ```

Per-page sessions (`session="<slug>"`) keep kernel state across blocks
within a single page, mirroring how Jupyter cells shared globals.

Setup blocks (previously the `nb-setup`-tagged invisible cell) become

  ```python exec="1" session="<slug>"
  ...imports + chdir + configure_formatter(...)...
  ```

with no `source=` attribute so the source is not rendered. They still
execute, populating the kernel state for the rest of the page. The chdir
logic now probes both `docs/source` (mkdocs builds from the repo root)
and `..` (covers `mkdocs serve` runs from the docs directory).

Other changes

* `pyproject.toml` [dependency-groups] docs: swap `mkdocs-jupyter` and
  the `ipython` / `pickleshare` ipykernel deps for
  `markdown-exec[ansi]`.
* `mkdocs.yml`: drop the `mkdocs-jupyter` plugin block (with its
  `remove_tag_config`) in favor of a bare `- markdown-exec`. Update
  every `.ipynb` nav entry to `.md`. Remove the now-obsolete
  `hooks: docs/hooks.py` line — `mkdocs-autorefs` resolves
  `[X][datafusion.Y.Z]` cross-references directly in `.md` markdown,
  so the custom HTML-pass hook the notebooks needed is no longer
  required.
* `docs/hooks.py`: deleted.
* `docs/source/images/jupyter_lab_df_view.png`: restored. The previous
  attempt to replace the static screenshot with a live `display(df)`
  cell relied on Jupyter's rich `_repr_html_` rendering, which
  markdown-exec doesn't drive — and the Sphinx-era docs always used
  text `__repr__` output for the rest of the page anyway, so a single
  static screenshot for the `display(df)` demonstration is the right
  fit. `introduction.md` references it again.
* Fix the bulk extension rewrite's collateral: every cross-page link
  that used to point at a `.ipynb` URL (in user-guide index pages, the
  FFI guide, and others) now points at the corresponding `.md` source.

Build is green: 0 execution errors across all converted pages, ~4 s
wall-clock, every DataFrame `df.show()` materializes as text output in
the rendered HTML.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Add `docs/hooks.py` with an `on_page_markdown` mkdocs hook that
  auto-prepends a shared `markdown-exec` setup block to every page
  containing `python exec="1"` fences. The block matches the first
  fence's `session="<slug>"` so kernel state carries over. Authors
  now write pages with no setup boilerplate at all — a single source
  of truth lives in the hook, so the setup can never drift across
  pages.
* Register the hook in `mkdocs.yml` (`hooks: - docs/hooks.py`).
* Strip the duplicated inline setup fence (imports + chdir +
  `configure_formatter`) from all 14 executable user-guide pages.
* `DataFrame.show()` writes through Rust's libc stdout (fd 1), which
  markdown-exec doesn't capture (it only redirects `sys.stdout`).
  The hook monkey-patches `DataFrame.show()` to call `print(self)`
  instead so the table appears in the rendered output.
* `DataFrame.__repr__()` returns a Rust-formatted ASCII table that
  hardcodes a trailing `"Data truncated."` line — the HTML-only
  `configure_formatter(show_truncation_message=False)` option does
  not affect this path. The hook monkey-patches `__repr__` to strip
  the suffix so example DataFrames don't advertise truncation in
  every block.
* Auto-wrap the final bare expression of each exec block in
  `print(...)`. Code cells in the original notebooks routinely
  ended with a bare `df` and relied on Jupyter's auto-display via
  `_repr_html_`. `exec()` doesn't echo last expressions, so without
  the wrap markdown-exec captures nothing. The AST-driven rewriter
  skips lines that are already `print(...)` / `display(...)` calls,
  and skips calls that return `None` (e.g. `df.show()`).
* Unwrap `print(<expr>.show())` back to `<expr>.show()` everywhere it
  slipped in — `df.show()` is now a side-effect call that prints via
  the monkey-patch, so wrapping it in `print(...)` would print the
  table once and then literal `None`.

`dev/check_api_coverage.py`: index the reference page's `:::` directive
targets so the coverage check still validates documented symbols after
the move from notebook-based pages back to plain Markdown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Set `show_root_full_path: false` globally so mkdocstrings renders
short class names (e.g. `ScalarUDF`) rather than fully-qualified paths.
Drop redundant `## Name` headers preceding matching `:::` directives
across reference pages so each symbol now produces a single TOC entry.

Replace the broken `[user_defined][datafusion.user_defined]` module
cross-reference on the windows user-guide page with a direct link to
the reference page, eliminating the `mkdocs_autorefs` warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Griffe's Google-style parser only recognizes the plural `Examples:`
section header. With the singular form, the doctest body fell through
as free-form prose and the `>>>` prompts rendered as nested Markdown
blockquotes instead of a syntax-highlighted pycon block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganize the mkdocs reference navigation to mirror the Python package
layout. The "API Reference" section is now a clickable landing that
expands into a `datafusion` subsection, which in turn expands into one
page per submodule (catalog, context, dataframe, expr, functions,
input, io, ipc, object_store, options, plan, record_batch, substrait,
unparser, user_defined). Submodule pages live under
`docs/source/reference/datafusion/` and the package-level landing uses
`index.md` so Material's `navigation.indexes` feature collapses each
section header onto its own landing page.

Move `col`, `column`, `lit`, `literal`, and related top-level
conveniences from `expr.md` onto the new package landing page where
they actually live, rename `formatter.md` to `dataframe_formatter.md`
to match the module name, and add `input.md` for the previously
undocumented `datafusion.input` subpackage.

Fix the package docstring's "Quick start" doctest (now rendered as
a syntax-highlighted pycon block via the `Examples:` Google section)
and replace a broken `LogicalExtensionCodec` cross-reference in
`Expr.to_bytes` with a working link to `with_logical_extension_codec`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring the strict mkdocs build to zero actionable warnings.

Cross-reference fixes:
- Replace ~20 unqualified `[X][X]` refs with fully-qualified targets
  (sort/sort_by, cube/rollup/grouping_sets, array_* aliases, register_*,
  Volatility, Serde, Producer, max_rows, metrics, etc.).
- Fix typos: `Dataframe` -> `DataFrame`, `datafusion.Expr.to_bytes` ->
  `datafusion.expr.Expr.to_bytes`.
- Drop links to non-Python symbols: `SessionState`, `CreateExternalTable`,
  `multiprocessing.Pool`, `cloudpickle`; point the `ObjectStore`
  reference at the module page; link `LogicalExtensionCodec` to
  `with_logical_extension_codec`.
- Update relative link in `dataframe_formatter.md` for the subdirectory
  move.

Public surface and reference page coverage:
- Add `__all__` to `context`, `dataframe`, `dataframe_formatter`, `io`,
  `record_batch`, `user_defined`, `input/base`, `input/location` so the
  public surface is explicit.
- Document the newly-declared public symbols on the corresponding
  reference pages (Compression, Volatility, *Exportable Protocols,
  ArrowStreamExportable, etc.).
- Update `object_store.md` to render the PyO3 class aliases via explicit
  per-class directives (whole-module discovery skips re-assigned PyO3
  bindings).
- Allow `__next__`/`__anext__` through the formatter filter on
  `RecordBatchStream` so the iterator protocol is documented.

Docstring fixes:
- Convert overload-impl Args blocks (ScalarUDF.udf, AggregateUDF.udaf,
  WindowUDF.udwf) to free-form prose: their actual signatures are
  `*args, **kwargs`, which griffe was flagging.
- Fix Args continuation indent in the formatter, the `idx::` typo, and a
  `with_extension` Args entry whose continuation lost its indent.
- Re-wrap doctring lines that pushed past the 88-column limit after the
  cross-ref qualifications.

mkdocstrings config: add `docstring_options: warn_unknown_params: false`
so future overload-impl patterns don't trip the build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace trailing-slash URL-style links left over from the Sphinx site
with explicit `.md` paths so mkdocs can resolve and validate them. Also
point the DataFusion 52 upgrade-guide reference at the FFI contributor
page (the bare `[ffi](ffi)` link no longer pointed anywhere).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert ~50 `print(df.<chain>)` calls in the user-guide pages to
`df.<chain>.show()`, which is the idiomatic way to display a DataFusion
DataFrame and matches what users would actually write. Lines whose
chain returns a non-DataFrame (`.schema()`, `.to_pandas()`) keep the
explicit `print()`.

Also strip `result="text"` from executable blocks that produce no
output (variable assignments only) and hide any remaining empty
markdown-exec result containers via a CSS rule so the page no longer
shows distracting empty boxes after assignment-only examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JetBrains and other IDEs do not understand mkdocstrings autoref
syntax (`[name][path]`) and display it as literal text in docstring
hovers. Switch docstring cross-references to sphinx-style roles
(:func:, :class:, :meth:, :attr:, :mod:, :exc:) which IDEs render
natively as clickable links.

A new griffe extension (`docs/griffe_extensions.py`) rewrites the
sphinx roles back into mkdocstrings autorefs before mkdocstrings
parses each docstring, so the docs site continues to produce
working cross-reference links.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as draft June 8, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant