Skip to content

feat: support Template Strings, eg t.select(doubled=t"{_.age} * 2")#11599

Open
NickCrews wants to merge 2 commits into
ibis-project:mainfrom
NickCrews:template
Open

feat: support Template Strings, eg t.select(doubled=t"{_.age} * 2")#11599
NickCrews wants to merge 2 commits into
ibis-project:mainfrom
NickCrews:template

Conversation

@NickCrews
Copy link
Copy Markdown
Contributor

@NickCrews NickCrews commented Sep 3, 2025

Description of changes

I am very excited about this PR, I think this really brings ibis into the future of python and SQL engines. I don't know of any other python dataframe library that has an API like this at all. It removes a huge barrier for folks who want to be able to write closer to raw SQL, and also allows more escape hatches to get down into the SQL guts when folks need it.

In summary, this allows for my_table.select(doubled=t"{ibis._.my_col} * 2") in python 3.14+, and my_table.select(doubled=ibis.t("{ibis._.my_col} * 2")) in versions below that.

Fixes #11525. Inspired by https://orm.drizzle.team/docs/sql.

I still need to add more tests to be thorough, but in the large scale I am quite happy with the API.

I think I managed to implement this without adding really any cans of worms. The semantics seem pretty consistent with what we already have, this isn't breaking at all, and I don't think I'm adding ill-conceived data model that we will regret later.

ibis.t() as a backport of python 3.14s t"hello {name}!" syntax

I vendored in the implementation from https://github.com/abilian/tstrings-backport, which has tests. I've been working on this and it seems worth depending on. I chose to vendor it in to avoid a pypi dependency. I didn't include tests, since the upstream package is tested. There are a few limitations, like it can't handle t("nested braces: {{1,2,3}}"), but is otherwise pretty robust.

I added a few other features on top, such as PTemplate, to help with typing to represent either our implementation's Template, or the builtin string.templatelib.Template, or any other implementation that ducktypes as one.

I chose to expose this so that users could actually use it as ibis.t(). I think they will go and import it from our private modules anyway, so might as well get the API right from the beginning and then actually be useful to people on python <3.14

I publicly exported ibis.t at the top level, but the rest you have to go through a submodule, eg

  • ibis.tstring.Template
  • ibis.tstring.Interpolation
  • ibis.tstring.PTemplate
  • ibis.tstring.PInterpolation

I'm open to adjusting this though.

ibis.sql_value(<PTemplate>) as a way of creating an ir.Value or an ibis.Deferred from a Template-ish

The exact signature is

def sql_value(
    template: PTemplate,
    /,
    *,
    dialect: str | sg.Dialect | None = None,
    type: dt.IntoDtype | None = None,
) -> ir.Value | Deferred

I considered allowing ibis.sql_value("{my_table.column} + 5"), but decided it would be better to be explicit that a template is getting created.

If any Interpolation is a Deferred, this returns a Deferred, because we can't infer a specific datatype without the concrete values.

You can also pass in a specific dialect. If you pass in one, that is used. If you don't pass in a dialect, then we look through all the Interpolation values and infer the dialect from them (if there are >1 backends, that's an error). If none of the Interpolations have any bound backends, then we fall back to the dialect of ibis.get_backend() at op creation time.

You don't need to pass in a datatype: if you skip it, we infer it by compiling the ibis expression to a sqlglot expression and then using sqlglots datatype inference utils.

Sometimes, this doesn't work though, for example the original motivating example from the linked issue:

import ibis
timestamp = ibis.timestamp("2024-08-01 21:44:00")  # noqa: F841
template = t("{timestamp} AT TIME ZONE 'UTC' AT TIME ZONE 'America/Anchorage'")
in_ak_time = ibis.sql_value(template)
in_ak_time.type() # gives dt.Unknown()

because sqlglot doesn't have complete coverage for introspecting the datatypes of this weird syntax.

To accommodate this, you can pass an explicit datatype, eg ibis.sql_value(template, type="timestamp"). Or, it also is possible to do ibis.sql_value(template).cast("timestamp"), just cast away from the Unkown type.

The name is ibis.sql_value(). I went with this, instead of plain ibis.sql(), because I wanted it to be more explicit that this doesn't accept SELECT statements, and results in a single Value, not a relation. It also COULD be misleading because it can return a Deferred, which isn't technically a Value. So I'm open to other names.

Direct use of templates are inferred as values

You don't need to go through ibis.sql_value() in many cases! For example:

my_table.select(doubled=ibis.t("{ibis._.my_col} * 2"))

In general, you need to use ibis.sql_value if:

  • You need a Deferred or Value, eg so you can do futher method chaining on it, eg .upper().name("uppercased")
  • You need to specify a dialect or datatype

It is important to understand that this is locking us into a SQL as a first class citizen here. By doing this, it prevents us from interpreting Templates with any other DSL, for example PRQL. If we didn't want to lock ourselves in, then we could remove this feature, and require the extra wrapping in ibis.sql_value, eg my_table.select(doubled=ibis.sql_value(ibis.t("{ibis._.my_col} * 2"))), because then that would allow for an eg ibis.prql_value() function. But, I think we are committed enough to SQL at this point that I think the beauty/simplicity of the reduced syntax is worth the lock in.

Relation-valued template

Currently this PR only supports single-Value SQL expressions, such as {table.column} + 5. But, it wouldn't be hard to extend this to accept relation-valued SQL expressions, such as SELECT {table.column} + 5 as a, {ibis._.b - 3} as b. This would probably be implemented re-using a lot of the same core types, but with a different op class, and a slightly different method for resolving deferreds. I would imagine this API of something like

table = ibis.table({"column": int, "b": str})
five = 5
new_table = ibis.sql_table(ibis.t("SELECT {table.column} + {five} as a, {table.b - 3} as b"))
table.sql(ibis.t("SELECT {table.column} + {five} as a, {ibis._.b - 3} as b"))
# I don't think this is allowed, because it uses a deferred, and we can't infer the dtype.
# But perhaps this could return some sort of Deferred? But IDK how you would then consume this.
ibis.sql_table(ibis.t("SELECT {ibis._.column} + 5 as a")) # ERRORS

Note how this interpolation syntax allows us to avoid the messiness that is the current method of table.alias("a_name_i_hope_is_unique").sql("SELECT * from a_name_i_hope_is_unique")

Open question: type inferrence is a little inconsistent

sqlglot interprets any bare integers as int32. This is a little different from us, where we interpret eg ibis.literal(4) as int8. For example, this is a current test (slightly simplified)

def test_direct_select(con, alltypes, backend):
    """Test template with column interpolation."""
    five = 5  # noqa: F841
    selected = alltypes.select(scalar=t("{five} - 1"))
    expected = alltypes.select(scalar=ibis.literal(4).cast("int32"))
    actual_schema = selected.schema()
    expected_schema = expected.schema()
    assert expected_schema == actual_schema
    result = con.execute(selected)
    expected_result = con.execute(expected)
    backend.assert_frame_equal(result, expected_result)

Eg we need the explicit .cast()s in the expected = ...

I think this is acceptable, but I haven't thought super deeply about if there is a better way. The escape hatch of explicitly setting the dtype is a nice escape hatch.

Open question: overly loose parsing

Currently, if you pass t"CAST({x} AS REAL)" (eg using sqlite syntax) to sql_value() with dialect="duckdb", sqlglot is generous and still parses it. But perhaps we should be more strict and error?

@github-actions github-actions Bot added tests Issues or PRs related to tests sql Backends that generate SQL labels Sep 3, 2025
@NickCrews NickCrews requested a review from cpcloud September 3, 2025 19:05
@NickCrews NickCrews added feature Features or general enhancements compiler Issues or PRs related to compilation labels Sep 3, 2025
@NickCrews
Copy link
Copy Markdown
Contributor Author

All of the failing tests look to be unrelated, some issue with pins, possibly due to GCS access issues?

@github-actions github-actions Bot added the datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) label Jan 20, 2026
@NickCrews NickCrews force-pushed the template branch 2 times, most recently from a8687db to a62f475 Compare January 21, 2026 04:00
@NickCrews NickCrews changed the title feat: add ibis.sql_value() feat: support Template Strings, eg t.select(doubled=t"{_.age} * 2") Jan 21, 2026
@github-actions github-actions Bot added the sqlite The SQLite backend label Jan 21, 2026
Before, if you used this function when duckdb wasn't installed, you got an error. Now you can use it without duckdb installed.
@NickCrews
Copy link
Copy Markdown
Contributor Author

NickCrews commented Jan 26, 2026

I'm realizing I want this to support multiple dialects better. eg if you just have ibis.sql_value(t"...", type=int, dialect="duckdb"), then what happens when you try to compile this on eg bigquery? we currently have to rely on sqlglot to implement the translation rule. But they inevitably won't, and I want to provide an escape hatch for someone to write the sql for various dialects.

I'm not sure exactly how this should look:

  • mapping as first arg: ibis.sql_value({"duckdb": t"...", "bigquery": t"..."}, type=int). But then what happens when you try to compile on an unspecified dialect, eg postgres? I guess take the first version, in this case duckdb, and transpile with sqlglot?
  • composing as cases:
duckdb = ibis.sql_value(t"...", type=int, dialect="duckdb")
bq = ibis.sql_value(t"...", type=int, dialect="bigquery")
ibis.sql_value([duck, bq])

I don't love this though, takes too much typing

  • Something else?

My desire here is that this would be the key that enables a library author to expose a single function that is portable between backends.

Currently, there is no way to do this without actually writing an Op and then monkeypatching in compile rules into the compiler. Of course, this SQL interface for writing different ways of compiling does just completely prevent you from writing a compilation rule for non-sql backends, eg polars. So perhaps the Op/compile rule method should be the official way to get true portability (and we should make this an official API), and these sql_value strings are just quick and dirty ways to get something done on a particular or small subset of backends

@NickCrews
Copy link
Copy Markdown
Contributor Author

@cpcloud can I schedule a video call with you to talk through this? I would love to get confirmation this is the right direction before I sinnk more time into it.

@cpcloud
Copy link
Copy Markdown
Member

cpcloud commented Jan 27, 2026

Yeah, let's do it.

@NickCrews
Copy link
Copy Markdown
Contributor Author

@cpcloud can you book 60 minutes with me here? https://calendar.app.google/Ab3PFW7SFXHdb3b27

@NickCrews
Copy link
Copy Markdown
Contributor Author

I have implemented a nice API of this for duckdb-python here, I am thinking of porting some of those learnings back to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler Issues or PRs related to compilation datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) feature Features or general enhancements sql Backends that generate SQL sqlite The SQLite backend tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Value.sql(sql: str | string.Template, *, name: str | None = "{{value}}")

2 participants