materialize-s3-iceberg: s/python/iceberg-go/#4347
Conversation
1927e86 to
93f26fc
Compare
time.Parse(time.DateOnly, ...) rejects 5-digit year strings such as "10000-01-01". Fall back to a regex parse so Parquet DATE columns can encode dates outside the year-1–9999 band that Python datetime imposed.
The clamping to year 1-9999 was required because pyarrow materialised Parquet column min/max stats through Python datetime, which overflows outside that band. iceberg-go reads the same stats as plain int64/int32, so values like "9999-12-31T23:59:59-14:00" (UTC year 10000) and "10000-01-01" now pass through without modification.
Remove stale pyarrow/pyiceberg references from the overflow regression tests and migration-test skip comment now that the Python path is gone. Regression tests now write the raw overflow values directly instead of pre-clamping them.
93f26fc to
a645fa8
Compare
|
@jacobmarble do we still want to do this? |
I still want to do it, sooner is better IMHO. The Python Iceberg implementation is more mature than the Golang impl, but we don't push it very hard. The PR hasn't been updated recently because no one had reviewed it yet. If you support the effort (with caveat to mitigate regression risks) then I'll rebase and squash to make the review easier. |
|
@jacobmarble yeah I think this is worth doing if we can mitigate the risk 👍🏽 |
Description:
Replaces the embedded Python
iceberg-ctlsubprocess with the native Goiceberg-golibrary. The connector now creates and writes to Iceberg tables in-process.New tables are created with metadata retention defaults (gzip metadata, drop prior versions on commit, expire snapshots older than 24h) to keep metadata files under object-store size limits.
As a nice side-effect, this also works around a date range issue in Python. When Iceberg metadata contains dates outside of the years 1 through 9999, it fails to parse because the corresponding Python type has this limit. Now that Python is removed from the code path, these dates are parsed without issue.
Workflow steps:
No user-visible changes.
Documentation links affected:
None.
Notes for reviewers:
iceberg-ctl/Python tree is removed wholesale.go/schema-gen/generate.gonow traversesoneOf/anyOfbranches so fixups apply to nullable-wrapped fields emitted by invopop.