Add arrayType, rowType, mapType and varbinary support to format()#29146
Add arrayType, rowType, mapType and varbinary support to format()#29146gertjanal wants to merge 1 commit intotrinodb:masterfrom
Conversation
9da794a to
cf3da06
Compare
|
TODO: escape characters? Quotes? |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java (2)
119-148:⚠️ Potential issue | 🔴 Critical
addDependenciesmust recurse into nested types to declare needed casts.The new early-return adds
ArrayType/RowType/MapTypebut does not recurse into element/field/key/value types. Duringspecialize, thevalueConverterrecursion will call back intoconverter(functionDependencies, nestedType)for nested types, and any nested type that is not one of the early-returned primitives (e.g.UUID,IPADDRESS,HyperLogLog, user types, nestedROWcontaining such a type) will hit the fallbackfunctionDependencies.getCastImplementation(type, VARCHAR, ...)at line 281. Because that cast was never declared inaddDependencies, this throws at query time with a missing-dependency error.Please walk the children:
private static void addDependencies(FunctionDependencyDeclarationBuilder builder, Type type) { if (type.equals(UNKNOWN) || ... || type instanceof CharType || - type instanceof ArrayType || - type instanceof RowType || - type instanceof MapType || type instanceof VarbinaryType) { return; } + if (type instanceof ArrayType arrayType) { + addDependencies(builder, arrayType.getElementType()); + return; + } + if (type instanceof MapType mapType) { + addDependencies(builder, mapType.getKeyType()); + addDependencies(builder, mapType.getValueType()); + return; + } + if (type instanceof RowType rowType) { + rowType.getTypeParameters().forEach(t -> addDependencies(builder, t)); + return; + } if (type.equals(JSON)) { ... } builder.addCast(type, VARCHAR); }Please also add a test covering a type that requires a cast inside a complex type (e.g.
format('%s', ARRAY[uuid '...'])orformat('%s', row(uuid '...'))).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java` around lines 119 - 148, The addDependencies(FunctionDependencyDeclarationBuilder builder, Type type) currently returns early for ArrayType/RowType/MapType and thus fails to declare casts for nested non-primitive types; modify addDependencies to recurse into nested component types: for ArrayType call addDependencies(builder, elementType), for MapType call addDependencies(builder, keyType) and addDependencies(builder, valueType), and for RowType iterate row.getFields() and call addDependencies(builder, field.getType()) for each field, while preserving the existing JSON handling and primitive early returns; also add a unit test that calls FormatFunction.format with a complex value that contains a type requiring a cast (e.g., format('%s', ARRAY[uuid '...'] or format('%s', ROW(uuid '...')) ) to assert no missing-dependency error during specialize.
154-155:⚠️ Potential issue | 🟠 MajorAnonymous rows rendered as
{...}are not valid JSON.Given the PR's stated goal of JSON-compatible output, an anonymous row without field names emits
{"hello", "world"}, which is not valid JSON (an object requireskey: valuepairs). Consider rendering anonymous (or all-unnamed) rows as a JSON array["hello", "world"], consistent with the array representation. This also sidesteps the asymmetry whereARRAY['hello','world']→["hello", "world"]butrow('hello','world')→{"hello", "world"}.Worth calling out for
@martintgiven thesyntax-needs-reviewlabel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java` around lines 154 - 155, The code currently builds a JSON-like object for rows using the field converters list (created from rowType.getFieldTypes()), which produces invalid JSON for anonymous rows (e.g., {"hello","world"}); change the row formatting code in FormatFunction to detect when a row has no field names (all fields unnamed) and emit a JSON array instead of an object for that case — use the converters list to serialize values in order into ["v1","v2",...] when field names are absent, and keep the existing object serialization for rows with at least one named field so named rows continue to emit {"key": "value"}.
🧹 Nitpick comments (4)
core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java (2)
151-183: Add coverage for escaping, non-trivial batches, and nested complex types.The happy-path cases are covered, but several behaviors introduced by this change are not exercised and are exactly where the current implementation has issues flagged in
FormatFunction.java:
- Values containing
",\, and newlines (escaping gap inquotedValue): e.g.format('%s', ARRAY['he said "hi"']),format('%s', row('a\b')).- Multiple rows / non-zero map or row offsets — e.g. unnesting a
VALUEStable of rows/maps so that the SqlMap has a non-zerorawOffsetand SqlRow a non-zerorawIndex; this would have caught the position/offset bugs.- Nested complex types:
format('%s', ARRAY[ARRAY[1,2], ARRAY[3]]),format('%s', map(ARRAY['a'], ARRAY[ARRAY[1,2]])),format('%s', row(ARRAY[row(1)])).- Null handling: null row field, null map value, null array element (the outer
converternull-check at line 192 only guards the top-level block).- Types requiring cast-to-VARCHAR inside a complex type, e.g.
format('%s', ARRAY[uuid '...']), to cover theaddDependenciesrecursion.- Empty row/array/map.
Also, consider asserting
ARRAY[TRUE, FALSE]formats to[true, false]from aSELECT ... FROM VALUESto confirm block encoding doesn't matter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java` around lines 151 - 183, Tests for FormatFunction are missing cases that reveal bugs in escaping, offsets, nesting, nulls and dependency-addition; add targeted unit tests invoking format("%s", ...) covering values with quotes/backslashes/newlines (to exercise quotedValue escaping), multi-row/non-zero map/row positions (to exercise SqlMap.rawOffset and SqlRow.rawIndex via VALUES/UNNEST), nested complex types (arrays/maps/rows within each other), nulls inside nested containers (null array element, null map value, null row field to exercise the converter null-check), types requiring cast-to-VARCHAR inside containers (to exercise addDependencies recursion), and empty containers; ensure at least one test uses SELECT ... FROM VALUES to confirm block encoding doesn't change ARRAY/BOOLEAN output. Reference FormatFunction.java, quotedValue, converter, SqlMap, SqlRow, rawOffset/rawIndex, and addDependencies when adding these tests.
166-170: Integer array formatting inherits Java's%stoString, which may surprise users.For
ARRAY[1, 2, 3]the test expects[1, 2, 3]. Under the current implementation, elements go throughINTEGER::getIntthen the value isLong((long) INTEGER.getInt(...)) and ends up inStringBuilder.append(Object)viaquotedValue. Worth confirming behavior forBIGINT,REAL/DOUBLE(e.g.ARRAY[1.5, NaN, infinity()]—NaN/Infinityare not valid JSON), andDECIMAL(which becomesBigDecimal.toString, including scientific notation for some values). Adding explicit assertions pins down the contract and prevents drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java` around lines 166 - 170, The test currently relies on implicit Java toString behavior for numeric array elements which can drift; update TestFormatFunction to add explicit assertions that pin the expected JSON-like formatting for numeric arrays: add cases for integer arrays (format("%s", "ARRAY[1, 2, 3]") expecting "[1, 2, 3]"), BIGINT arrays, REAL/DOUBLE arrays (include values like 1.5, NaN, Infinity and assert the current serialized tokens so the contract is clear), and DECIMAL arrays (assert BigDecimal string formatting, including scientific notation cases); reference the existing format(...) test method and the code paths that call quotedValue / INTEGER::getInt to locate where to add these assertions and include short comments describing the expected contract for each numeric type.core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java (2)
291-339: Use primitiveintfor thepositionparameter.The new
rowToString/mapToString/arrayToStringhelpers declareInteger position, which forces auto-boxing on every call (these are invoked from per-row converters). All other converters in this file pass a primitiveint. Preferintfor consistency and to avoid allocation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java` around lines 291 - 339, Change the three helper method signatures from Integer to primitive int to avoid boxing: update rowToString(FunctionDependencies, RowType, Block, Integer) to use int, mapToString(FunctionDependencies, MapType, Block, Integer) to use int, and arrayToString(FunctionDependencies, ArrayType, Block, Integer) to use int; adjust any internal uses (they already treat position as a primitive) and update all callers to pass an int (they already do in this file) so there is no autoboxing allocation at each call.
326-339: UsearrayType.getObject(block, position)for consistency and to handle non-ArrayBlockencodings.The assumption in the code is correct:
arrayBlock.getArray(position)returns a region-scoped block (viagetRegion()), so iterating0..getPositionCount()is safe. However, the current code is fragile because it only handlesArrayBlockinstances directly. Consider switching toarrayType.getObject(block, position)to match patterns used elsewhere in the codebase and to gracefully handle alternative array encodings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java` around lines 326 - 339, The arrayToString method should stop assuming block is an ArrayBlock and instead obtain the element block via arrayType.getObject(block, position); replace the ArrayBlock-specific branch that calls arrayBlock.getArray(position) with a single call to arrayType.getObject(block, position), then iterate elementBlock.getPositionCount() and call quotedValue(functionDependencies, arrayType.getElementType(), elementBlock.getSingleValueBlock(i), i) (or otherwise pass the correct inner position) as before; update references to ArrayBlock/getArray to use arrayType.getObject to handle alternative encodings while keeping the rest of the logic in arrayToString unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 291-305: The rowToString method is passing the outer position into
quotedValue for nested field reads, which is wrong; compute int rawIndex =
row.getRawIndex() from the SqlRow and use that rawIndex when calling quotedValue
(and any other field-level accessors) with row.getRawFieldBlock(i) and
fields.get(i).getType() so field reads use the row's raw index instead of the
outer position.
- Around line 285-289: The quotedValue method currently wraps values with
String.format("\"%s\"", value) which does not escape embedded quotes,
backslashes or control characters, and field names are formatted similarly
(String.format("\"%s\": ", name)); replace those naive wrappers by encoding the
string via a proper JSON string encoder (e.g., use
JsonStringEncoder.getInstance().quoteAsString(...) or ObjectMapper/JsonFactory
to produce a JSON-safe quoted string) when formatting both values in quotedValue
and field names in the name formatting site, keeping use of
FormatFunction.converter(...) for type conversion; add unit tests for
FormatFunction (or the relevant formatter tests) that assert proper JSON output
for values containing quotes, backslashes, newlines and for row fields whose
names contain those characters.
- Around line 307-324: The mapToString method is indexing raw key/value blocks
incorrectly; when using SqlMap from a MapBlock you must use
sqlMap.getRawOffset() to compute the real positions. Update mapToString (and its
loop over sqlMap.getSize()) to obtain keys = sqlMap.getRawKeyBlock() and values
= sqlMap.getRawValueBlock() and call quotedValue(functionDependencies,
mapType.getKeyType(), keys, sqlMap.getRawOffset() + i) and
quotedValue(functionDependencies, mapType.getValueType(), values,
sqlMap.getRawOffset() + i) (mirror the offset usage pattern from ModelUtils) so
the raw blocks are indexed with sqlMap.getRawOffset() rather than using i or
getSingleValueBlock.
- Around line 251-262: The instanceof checks in FormatFunction produce wrong
results for wrapped blocks and the helpers pass wrong positions to single-value
blocks; replace the block+position lambdas to extract objects via the Type API
(use rowType.getObject(block, position) cast to SqlRow, mapType.getObject(... )
to SqlMap, and arrayType.getObject(... ) to a Block) in the lambdas, then update
rowToString, mapToString, and arrayToString signatures to accept those extracted
objects (SqlRow, SqlMap, Block) and operate on them directly (treat any
single-value blocks with position 0 when calling quotedValue inside those
helpers) so position-translation and wrapped-block handling are correct.
In `@docs/src/main/sphinx/functions/conversion.md`:
- Around line 49-61: The SQL examples have syntax and formatting issues: close
the missing parenthesis on the map(...) call inside the format(...) invocation
(the expression involving map(ARRAY['greeting', 'planet'], ARRAY['hello',
'world']) should be fully enclosed in format(...)), put each SELECT statement on
a single line (combine the row(... AS row(greeting varchar, planet varchar))
cast into one line) to match other examples, and append a trailing semicolon to
every SELECT (including the row/cast example, the ARRAY example, the map
example, and the from_base64('d29ybGQ=') example) so all snippets are consistent
and syntactically correct.
---
Outside diff comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 119-148: The addDependencies(FunctionDependencyDeclarationBuilder
builder, Type type) currently returns early for ArrayType/RowType/MapType and
thus fails to declare casts for nested non-primitive types; modify
addDependencies to recurse into nested component types: for ArrayType call
addDependencies(builder, elementType), for MapType call addDependencies(builder,
keyType) and addDependencies(builder, valueType), and for RowType iterate
row.getFields() and call addDependencies(builder, field.getType()) for each
field, while preserving the existing JSON handling and primitive early returns;
also add a unit test that calls FormatFunction.format with a complex value that
contains a type requiring a cast (e.g., format('%s', ARRAY[uuid '...'] or
format('%s', ROW(uuid '...')) ) to assert no missing-dependency error during
specialize.
- Around line 154-155: The code currently builds a JSON-like object for rows
using the field converters list (created from rowType.getFieldTypes()), which
produces invalid JSON for anonymous rows (e.g., {"hello","world"}); change the
row formatting code in FormatFunction to detect when a row has no field names
(all fields unnamed) and emit a JSON array instead of an object for that case —
use the converters list to serialize values in order into ["v1","v2",...] when
field names are absent, and keep the existing object serialization for rows with
at least one named field so named rows continue to emit {"key": "value"}.
---
Nitpick comments:
In `@core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.java`:
- Around line 291-339: Change the three helper method signatures from Integer to
primitive int to avoid boxing: update rowToString(FunctionDependencies, RowType,
Block, Integer) to use int, mapToString(FunctionDependencies, MapType, Block,
Integer) to use int, and arrayToString(FunctionDependencies, ArrayType, Block,
Integer) to use int; adjust any internal uses (they already treat position as a
primitive) and update all callers to pass an int (they already do in this file)
so there is no autoboxing allocation at each call.
- Around line 326-339: The arrayToString method should stop assuming block is an
ArrayBlock and instead obtain the element block via arrayType.getObject(block,
position); replace the ArrayBlock-specific branch that calls
arrayBlock.getArray(position) with a single call to arrayType.getObject(block,
position), then iterate elementBlock.getPositionCount() and call
quotedValue(functionDependencies, arrayType.getElementType(),
elementBlock.getSingleValueBlock(i), i) (or otherwise pass the correct inner
position) as before; update references to ArrayBlock/getArray to use
arrayType.getObject to handle alternative encodings while keeping the rest of
the logic in arrayToString unchanged.
In
`@core/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.java`:
- Around line 151-183: Tests for FormatFunction are missing cases that reveal
bugs in escaping, offsets, nesting, nulls and dependency-addition; add targeted
unit tests invoking format("%s", ...) covering values with
quotes/backslashes/newlines (to exercise quotedValue escaping),
multi-row/non-zero map/row positions (to exercise SqlMap.rawOffset and
SqlRow.rawIndex via VALUES/UNNEST), nested complex types (arrays/maps/rows
within each other), nulls inside nested containers (null array element, null map
value, null row field to exercise the converter null-check), types requiring
cast-to-VARCHAR inside containers (to exercise addDependencies recursion), and
empty containers; ensure at least one test uses SELECT ... FROM VALUES to
confirm block encoding doesn't change ARRAY/BOOLEAN output. Reference
FormatFunction.java, quotedValue, converter, SqlMap, SqlRow, rawOffset/rawIndex,
and addDependencies when adding these tests.
- Around line 166-170: The test currently relies on implicit Java toString
behavior for numeric array elements which can drift; update TestFormatFunction
to add explicit assertions that pin the expected JSON-like formatting for
numeric arrays: add cases for integer arrays (format("%s", "ARRAY[1, 2, 3]")
expecting "[1, 2, 3]"), BIGINT arrays, REAL/DOUBLE arrays (include values like
1.5, NaN, Infinity and assert the current serialized tokens so the contract is
clear), and DECIMAL arrays (assert BigDecimal string formatting, including
scientific notation cases); reference the existing format(...) test method and
the code paths that call quotedValue / INTEGER::getInt to locate where to add
these assertions and include short comments describing the expected contract for
each numeric type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be2def72-4194-4b9f-a72b-ae41ca84b425
📒 Files selected for processing (3)
core/trino-main/src/main/java/io/trino/operator/scalar/FormatFunction.javacore/trino-main/src/test/java/io/trino/operator/scalar/TestFormatFunction.javadocs/src/main/sphinx/functions/conversion.md
33da2b1 to
db479d9
Compare
|
@martint ready |
Description
Second attempt after #28234 was rejected. The goal is to create a readable string representation of row, array, map and varbinary, as suggested by @martint .
Consideration: the strings in rows and arrays are using double quotes, to make them json compatible. However, we could consider to use single quotes as they would better fit the sql standard.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X ) Release notes are required, with the following suggested text: