Skip to content

feat: add DISTINCT ON support for PostgreSQL.#2211

Open
VladislavYar wants to merge 2 commits into
tortoise:developfrom
VladislavYar:distinct-by-specific-fields
Open

feat: add DISTINCT ON support for PostgreSQL.#2211
VladislavYar wants to merge 2 commits into
tortoise:developfrom
VladislavYar:distinct-by-specific-fields

Conversation

@VladislavYar
Copy link
Copy Markdown

Description

Added DISTINCT ON (fields) support for PostgreSQL to QuerySet.distinct(*fields).

Previously, .distinct() only supported plain DISTINCT (no arguments). Now, passing field
names generates DISTINCT ON (fields) on PostgreSQL, keeping one row per unique combination
of the specified fields. The feature is fully propagated to .values(), .values_list(),
and .only().

Also added:

  • skipCapability decorator to tortoise.contrib.test — the inverse of requireCapability,
    skips a test when the specified capabilities match.
  • _apply_db() helper in _ChooseDBMixin that consistently sets the DB connection and
    switches the query builder to PostgreSQLQueryBuilder across all query classes.

Motivation and Context

PostgreSQL's DISTINCT ON is a powerful feature that standard DISTINCT cannot replace —
it allows selecting one representative row per group without a GROUP BY, while retaining
full model objects. This is commonly needed for "latest per group" or "first per category"
queries.

How Has This Been Tested?

Added tests/test_distinct.py covering:

  • Basic DISTINCT ON with and without ORDER BY
  • Multiple DISTINCT ON fields
  • Combined with .values_list() — single field, multiple fields, field outside DISTINCT ON
  • Combined with .values() — same variations
  • Combined with .only()
  • ORDER BY respects the selected row within each group
  • OperationalError when ORDER BY doesn't start with DISTINCT ON fields
  • OperationalError when used on a non-PostgreSQL database

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added the changelog accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 6, 2026

Merging this PR will not alter performance

✅ 24 untouched benchmarks


Comparing VladislavYar:distinct-by-specific-fields (fa341d0) with develop (403a4dc)

Open in CodSpeed

Comment thread tortoise/contrib/test/__init__.py Outdated
Comment thread tests/test_distinct.py Outdated
@waketzheng
Copy link
Copy Markdown
Contributor

@VladislavYar , this PR is helpful — let's go ahead with it.

Said by AI (Claude/Codex):

Findings

  1. distinct("field") is rejected on valid PostgreSQL query chains

    In QuerySet.distinct() the backend check uses:

    isinstance(self.query, PostgreSQLQueryBuilder)

However, many common query chains are not DB-bound yet when distinct() is called, for example:

Tournament.filter(name="x").distinct("name")
Tournament.annotate(...).distinct("name")

These start from manager.get_queryset(), so self.query is still the generic placeholder builder. As a result, this
raises OperationalError("DISTINCT ON is only supported by PostgreSQL") even when the actual model DB is PostgreSQL.

This check should be based on the chosen/model DB capability or deferred until query construction.

  1. DISTINCT ON fields are not resolved to database columns

    The implementation passes ORM field names directly to PyPika:

    self.query.distinct_on(*self._distinct_on)

    That bypasses Tortoise’s normal field resolution. Fields with source_field, relation traversals, or fields needing
    joins will generate wrong SQL.

    Example: for a model field like:

    chars = fields.CharField(source_field="some_chars_table")

    distinct("chars") should use "some_chars_table", but this code emits "chars".

    The distinct-on fields need to be resolved the same way ordering/select fields are resolved.

  2. Default Meta.ordering is not included in the DISTINCT ON validation

    resolve_ordering() applies model default ordering when self._orderings is empty, but the DISTINCT ON validation
    only checks self._orderings.

    That means a model with default ordering not starting with the distinct-on fields can produce invalid PostgreSQL
    SQL without being caught by this validation.

Test Coverage Gap

The new tests cover Tournament.all().distinct("name"), but they do not cover:

  • filter(...).distinct("field") on PostgreSQL
  • fields with source_field
  • relation / joined fields
  • models with default Meta.ordering

Those cases should be added before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants