Skip to content

Add arrayType, rowType, mapType and varbinary support to format()#29146

Open
gertjanal wants to merge 1 commit intotrinodb:masterfrom
gertjanal:format-row
Open

Add arrayType, rowType, mapType and varbinary support to format()#29146
gertjanal wants to merge 1 commit intotrinodb:masterfrom
gertjanal:format-row

Conversation

@gertjanal
Copy link
Copy Markdown
Contributor

@gertjanal gertjanal commented Apr 16, 2026

Description

Second attempt after #28234 was rejected. The goal is to create a readable string representation of row, array, map and varbinary, as suggested by @martint .

SELECT format('%s', cast(row('hello', 'world') AS row(greeting varchar, planet varchar)));
-- '{"greeting": "hello", "planet": "world"}'

SELECT format('%s', row('hello', 'world'));
-- '["hello", "world"]'

SELECT format('%s', ARRAY['hello', 'world']);
-- '["hello", "world"]'

SELECT format('%s', map(ARRAY['greeting', 'planet'], ARRAY['hello', 'world']));
-- '{"greeting": "hello", "planet": "world"}'

SELECT format('%s', from_base64('d29ybGQ='));
-- 'd29ybGQ='

Consideration: the strings in rows and arrays are using double quotes, to make them json compatible. However, we could consider to use single quotes as they would better fit the sql standard.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X ) Release notes are required, with the following suggested text:

## Section
* Add arrayType, rowType, mapType and varbinary support to format()

@cla-bot cla-bot Bot added the cla-signed label Apr 16, 2026
@github-actions github-actions Bot added the docs label Apr 16, 2026
@gertjanal gertjanal requested a review from martint April 16, 2026 20:12
@gertjanal gertjanal force-pushed the format-row branch 3 times, most recently from 9da794a to cf3da06 Compare April 16, 2026 20:34
@gertjanal gertjanal marked this pull request as ready for review April 16, 2026 20:35
@gertjanal gertjanal requested a review from wendigo April 16, 2026 20:37
@gertjanal
Copy link
Copy Markdown
Contributor Author

TODO: escape characters? Quotes?

@martint
Copy link
Copy Markdown
Member

martint commented Apr 16, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The format() function is extended to support nested complex types including arrays, rows, maps, and varbinary. These types now process through new converters that serialize them into string representations: rows format as JSON-like objects with field names, arrays as JSON-like arrays, maps as JSON-like objects, and varbinary as Base64. A new quotedValue helper handles quoting values for specific types. Changes include updated dependency handling and value conversion logic, new type-specific converters, expanded imports, corresponding test cases validating the new formatting behavior, and documentation examples.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java (2)

119-148: ⚠️ Potential issue | 🔴 Critical

addDependencies must recurse into nested types to declare needed casts.

The new early-return adds ArrayType/RowType/MapType but does not recurse into element/field/key/value types. During specialize, the valueConverter recursion will call back into converter(functionDependencies, nestedType) for nested types, and any nested type that is not one of the early-returned primitives (e.g. UUID, IPADDRESS, HyperLogLog, user types, nested ROW containing such a type) will hit the fallback functionDependencies.getCastImplementation(type, VARCHAR, ...) at line 281. Because that cast was never declared in addDependencies, this throws at query time with a missing-dependency error.

Please walk the children:

 private static void addDependencies(FunctionDependencyDeclarationBuilder builder, Type type)
 {
     if (type.equals(UNKNOWN) || ... || type instanceof CharType ||
-            type instanceof ArrayType ||
-            type instanceof RowType ||
-            type instanceof MapType ||
             type instanceof VarbinaryType) {
         return;
     }
+    if (type instanceof ArrayType arrayType) {
+        addDependencies(builder, arrayType.getElementType());
+        return;
+    }
+    if (type instanceof MapType mapType) {
+        addDependencies(builder, mapType.getKeyType());
+        addDependencies(builder, mapType.getValueType());
+        return;
+    }
+    if (type instanceof RowType rowType) {
+        rowType.getTypeParameters().forEach(t -> addDependencies(builder, t));
+        return;
+    }
     if (type.equals(JSON)) {
         ...
     }
     builder.addCast(type, VARCHAR);
 }

Please also add a test covering a type that requires a cast inside a complex type (e.g. format('%s', ARRAY[uuid '...']) or format('%s', row(uuid '...'))).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`
around lines 119 - 148, The addDependencies(FunctionDependencyDeclarationBuilder
builder, Type type) currently returns early for ArrayType/RowType/MapType and
thus fails to declare casts for nested non-primitive types; modify
addDependencies to recurse into nested component types: for ArrayType call
addDependencies(builder, elementType), for MapType call addDependencies(builder,
keyType) and addDependencies(builder, valueType), and for RowType iterate
row.getFields() and call addDependencies(builder, field.getType()) for each
field, while preserving the existing JSON handling and primitive early returns;
also add a unit test that calls FormatFunction.format with a complex value that
contains a type requiring a cast (e.g., format('%s', ARRAY[uuid '...'] or
format('%s', ROW(uuid '...')) ) to assert no missing-dependency error during
specialize.

154-155: ⚠️ Potential issue | 🟠 Major

Anonymous rows rendered as {...} are not valid JSON.

Given the PR's stated goal of JSON-compatible output, an anonymous row without field names emits {"hello", "world"}, which is not valid JSON (an object requires key: value pairs). Consider rendering anonymous (or all-unnamed) rows as a JSON array ["hello", "world"], consistent with the array representation. This also sidesteps the asymmetry where ARRAY['hello','world']["hello", "world"] but row('hello','world'){"hello", "world"}.

Worth calling out for @martint given the syntax-needs-review label.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`
around lines 154 - 155, The code currently builds a JSON-like object for rows
using the field converters list (created from rowType.getFieldTypes()), which
produces invalid JSON for anonymous rows (e.g., {"hello","world"}); change the
row formatting code in FormatFunction to detect when a row has no field names
(all fields unnamed) and emit a JSON array instead of an object for that case —
use the converters list to serialize values in order into ["v1","v2",...] when
field names are absent, and keep the existing object serialization for rows with
at least one named field so named rows continue to emit {"key": "value"}.
🧹 Nitpick comments (4)
core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java (2)

151-183: Add coverage for escaping, non-trivial batches, and nested complex types.

The happy-path cases are covered, but several behaviors introduced by this change are not exercised and are exactly where the current implementation has issues flagged in FormatFunction.java:

  • Values containing ", \, and newlines (escaping gap in quotedValue): e.g. format('%s', ARRAY['he said "hi"']), format('%s', row('a\b')).
  • Multiple rows / non-zero map or row offsets — e.g. unnesting a VALUES table of rows/maps so that the SqlMap has a non-zero rawOffset and SqlRow a non-zero rawIndex; this would have caught the position/offset bugs.
  • Nested complex types: format('%s', ARRAY[ARRAY[1,2], ARRAY[3]]), format('%s', map(ARRAY['a'], ARRAY[ARRAY[1,2]])), format('%s', row(ARRAY[row(1)])).
  • Null handling: null row field, null map value, null array element (the outer converter null-check at line 192 only guards the top-level block).
  • Types requiring cast-to-VARCHAR inside a complex type, e.g. format('%s', ARRAY[uuid '...']), to cover the addDependencies recursion.
  • Empty row/array/map.

Also, consider asserting ARRAY[TRUE, FALSE] formats to [true, false] from a SELECT ... FROM VALUES to confirm block encoding doesn't matter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java`
around lines 151 - 183, Tests for FormatFunction are missing cases that reveal
bugs in escaping, offsets, nesting, nulls and dependency-addition; add targeted
unit tests invoking format("%s", ...) covering values with
quotes/backslashes/newlines (to exercise quotedValue escaping),
multi-row/non-zero map/row positions (to exercise SqlMap.rawOffset and
SqlRow.rawIndex via VALUES/UNNEST), nested complex types (arrays/maps/rows
within each other), nulls inside nested containers (null array element, null map
value, null row field to exercise the converter null-check), types requiring
cast-to-VARCHAR inside containers (to exercise addDependencies recursion), and
empty containers; ensure at least one test uses SELECT ... FROM VALUES to
confirm block encoding doesn't change ARRAY/BOOLEAN output. Reference
FormatFunction.java, quotedValue, converter, SqlMap, SqlRow, rawOffset/rawIndex,
and addDependencies when adding these tests.

166-170: Integer array formatting inherits Java's %s toString, which may surprise users.

For ARRAY[1, 2, 3] the test expects [1, 2, 3]. Under the current implementation, elements go through INTEGER::getInt then the value is Long ((long) INTEGER.getInt(...)) and ends up in StringBuilder.append(Object) via quotedValue. Worth confirming behavior for BIGINT, REAL/DOUBLE (e.g. ARRAY[1.5, NaN, infinity()]NaN/Infinity are not valid JSON), and DECIMAL (which becomes BigDecimal.toString, including scientific notation for some values). Adding explicit assertions pins down the contract and prevents drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java`
around lines 166 - 170, The test currently relies on implicit Java toString
behavior for numeric array elements which can drift; update TestFormatFunction
to add explicit assertions that pin the expected JSON-like formatting for
numeric arrays: add cases for integer arrays (format("%s", "ARRAY[1, 2, 3]")
expecting "[1, 2, 3]"), BIGINT arrays, REAL/DOUBLE arrays (include values like
1.5, NaN, Infinity and assert the current serialized tokens so the contract is
clear), and DECIMAL arrays (assert BigDecimal string formatting, including
scientific notation cases); reference the existing format(...) test method and
the code paths that call quotedValue / INTEGER::getInt to locate where to add
these assertions and include short comments describing the expected contract for
each numeric type.
core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java (2)

291-339: Use primitive int for the position parameter.

The new rowToString/mapToString/arrayToString helpers declare Integer position, which forces auto-boxing on every call (these are invoked from per-row converters). All other converters in this file pass a primitive int. Prefer int for consistency and to avoid allocation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`
around lines 291 - 339, Change the three helper method signatures from Integer
to primitive int to avoid boxing: update rowToString(FunctionDependencies,
RowType, Block, Integer) to use int, mapToString(FunctionDependencies, MapType,
Block, Integer) to use int, and arrayToString(FunctionDependencies, ArrayType,
Block, Integer) to use int; adjust any internal uses (they already treat
position as a primitive) and update all callers to pass an int (they already do
in this file) so there is no autoboxing allocation at each call.

326-339: Use arrayType.getObject(block, position) for consistency and to handle non-ArrayBlock encodings.

The assumption in the code is correct: arrayBlock.getArray(position) returns a region-scoped block (via getRegion()), so iterating 0..getPositionCount() is safe. However, the current code is fragile because it only handles ArrayBlock instances directly. Consider switching to arrayType.getObject(block, position) to match patterns used elsewhere in the codebase and to gracefully handle alternative array encodings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`
around lines 326 - 339, The arrayToString method should stop assuming block is
an ArrayBlock and instead obtain the element block via
arrayType.getObject(block, position); replace the ArrayBlock-specific branch
that calls arrayBlock.getArray(position) with a single call to
arrayType.getObject(block, position), then iterate
elementBlock.getPositionCount() and call quotedValue(functionDependencies,
arrayType.getElementType(), elementBlock.getSingleValueBlock(i), i) (or
otherwise pass the correct inner position) as before; update references to
ArrayBlock/getArray to use arrayType.getObject to handle alternative encodings
while keeping the rest of the logic in arrayToString unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 291-305: The rowToString method is passing the outer position into
quotedValue for nested field reads, which is wrong; compute int rawIndex =
row.getRawIndex() from the SqlRow and use that rawIndex when calling quotedValue
(and any other field-level accessors) with row.getRawFieldBlock(i) and
fields.get(i).getType() so field reads use the row's raw index instead of the
outer position.
- Around line 285-289: The quotedValue method currently wraps values with
String.format("\"%s\"", value) which does not escape embedded quotes,
backslashes or control characters, and field names are formatted similarly
(String.format("\"%s\": ", name)); replace those naive wrappers by encoding the
string via a proper JSON string encoder (e.g., use
JsonStringEncoder.getInstance().quoteAsString(...) or ObjectMapper/JsonFactory
to produce a JSON-safe quoted string) when formatting both values in quotedValue
and field names in the name formatting site, keeping use of
FormatFunction.converter(...) for type conversion; add unit tests for
FormatFunction (or the relevant formatter tests) that assert proper JSON output
for values containing quotes, backslashes, newlines and for row fields whose
names contain those characters.
- Around line 307-324: The mapToString method is indexing raw key/value blocks
incorrectly; when using SqlMap from a MapBlock you must use
sqlMap.getRawOffset() to compute the real positions. Update mapToString (and its
loop over sqlMap.getSize()) to obtain keys = sqlMap.getRawKeyBlock() and values
= sqlMap.getRawValueBlock() and call quotedValue(functionDependencies,
mapType.getKeyType(), keys, sqlMap.getRawOffset() + i) and
quotedValue(functionDependencies, mapType.getValueType(), values,
sqlMap.getRawOffset() + i) (mirror the offset usage pattern from ModelUtils) so
the raw blocks are indexed with sqlMap.getRawOffset() rather than using i or
getSingleValueBlock.
- Around line 251-262: The instanceof checks in FormatFunction produce wrong
results for wrapped blocks and the helpers pass wrong positions to single-value
blocks; replace the block+position lambdas to extract objects via the Type API
(use rowType.getObject(block, position) cast to SqlRow, mapType.getObject(... )
to SqlMap, and arrayType.getObject(... ) to a Block) in the lambdas, then update
rowToString, mapToString, and arrayToString signatures to accept those extracted
objects (SqlRow, SqlMap, Block) and operate on them directly (treat any
single-value blocks with position 0 when calling quotedValue inside those
helpers) so position-translation and wrapped-block handling are correct.

In `@docs/src/main/sphinx/functions/conversion.md`:
- Around line 49-61: The SQL examples have syntax and formatting issues: close
the missing parenthesis on the map(...) call inside the format(...) invocation
(the expression involving map(ARRAY['greeting', 'planet'], ARRAY['hello',
'world']) should be fully enclosed in format(...)), put each SELECT statement on
a single line (combine the row(... AS row(greeting varchar, planet varchar))
cast into one line) to match other examples, and append a trailing semicolon to
every SELECT (including the row/cast example, the ARRAY example, the map
example, and the from_base64('d29ybGQ=') example) so all snippets are consistent
and syntactically correct.

---

Outside diff comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 119-148: The addDependencies(FunctionDependencyDeclarationBuilder
builder, Type type) currently returns early for ArrayType/RowType/MapType and
thus fails to declare casts for nested non-primitive types; modify
addDependencies to recurse into nested component types: for ArrayType call
addDependencies(builder, elementType), for MapType call addDependencies(builder,
keyType) and addDependencies(builder, valueType), and for RowType iterate
row.getFields() and call addDependencies(builder, field.getType()) for each
field, while preserving the existing JSON handling and primitive early returns;
also add a unit test that calls FormatFunction.format with a complex value that
contains a type requiring a cast (e.g., format('%s', ARRAY[uuid '...'] or
format('%s', ROW(uuid '...')) ) to assert no missing-dependency error during
specialize.
- Around line 154-155: The code currently builds a JSON-like object for rows
using the field converters list (created from rowType.getFieldTypes()), which
produces invalid JSON for anonymous rows (e.g., {"hello","world"}); change the
row formatting code in FormatFunction to detect when a row has no field names
(all fields unnamed) and emit a JSON array instead of an object for that case —
use the converters list to serialize values in order into ["v1","v2",...] when
field names are absent, and keep the existing object serialization for rows with
at least one named field so named rows continue to emit {"key": "value"}.

---

Nitpick comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 291-339: Change the three helper method signatures from Integer to
primitive int to avoid boxing: update rowToString(FunctionDependencies, RowType,
Block, Integer) to use int, mapToString(FunctionDependencies, MapType, Block,
Integer) to use int, and arrayToString(FunctionDependencies, ArrayType, Block,
Integer) to use int; adjust any internal uses (they already treat position as a
primitive) and update all callers to pass an int (they already do in this file)
so there is no autoboxing allocation at each call.
- Around line 326-339: The arrayToString method should stop assuming block is an
ArrayBlock and instead obtain the element block via arrayType.getObject(block,
position); replace the ArrayBlock-specific branch that calls
arrayBlock.getArray(position) with a single call to arrayType.getObject(block,
position), then iterate elementBlock.getPositionCount() and call
quotedValue(functionDependencies, arrayType.getElementType(),
elementBlock.getSingleValueBlock(i), i) (or otherwise pass the correct inner
position) as before; update references to ArrayBlock/getArray to use
arrayType.getObject to handle alternative encodings while keeping the rest of
the logic in arrayToString unchanged.

In
`@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java`:
- Around line 151-183: Tests for FormatFunction are missing cases that reveal
bugs in escaping, offsets, nesting, nulls and dependency-addition; add targeted
unit tests invoking format("%s", ...) covering values with
quotes/backslashes/newlines (to exercise quotedValue escaping),
multi-row/non-zero map/row positions (to exercise SqlMap.rawOffset and
SqlRow.rawIndex via VALUES/UNNEST), nested complex types (arrays/maps/rows
within each other), nulls inside nested containers (null array element, null map
value, null row field to exercise the converter null-check), types requiring
cast-to-VARCHAR inside containers (to exercise addDependencies recursion), and
empty containers; ensure at least one test uses SELECT ... FROM VALUES to
confirm block encoding doesn't change ARRAY/BOOLEAN output. Reference
FormatFunction.java, quotedValue, converter, SqlMap, SqlRow, rawOffset/rawIndex,
and addDependencies when adding these tests.
- Around line 166-170: The test currently relies on implicit Java toString
behavior for numeric array elements which can drift; update TestFormatFunction
to add explicit assertions that pin the expected JSON-like formatting for
numeric arrays: add cases for integer arrays (format("%s", "ARRAY[1, 2, 3]")
expecting "[1, 2, 3]"), BIGINT arrays, REAL/DOUBLE arrays (include values like
1.5, NaN, Infinity and assert the current serialized tokens so the contract is
clear), and DECIMAL arrays (assert BigDecimal string formatting, including
scientific notation cases); reference the existing format(...) test method and
the code paths that call quotedValue / INTEGER::getInt to locate where to add
these assertions and include short comments describing the expected contract for
each numeric type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be2def72-4194-4b9f-a72b-ae41ca84b425

📥 Commits

Reviewing files that changed from the base of the PR and between 12f5177 and cf3da06.

📒 Files selected for processing (3)
  • core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java
  • core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java
  • docs/src/main/sphinx/functions/conversion.md

Comment thread core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java Outdated
Comment thread docs/src/main/sphinx/functions/conversion.md
@gertjanal gertjanal force-pushed the format-row branch 8 times, most recently from 33da2b1 to db479d9 Compare April 20, 2026 17:03
@gertjanal
Copy link
Copy Markdown
Contributor Author

@martint ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants