fix: [SNOW-3066557] read SQL files as UTF-8 regardless of system locale#2982
Draft
sfc-gh-olorek wants to merge 1 commit into
Draft
fix: [SNOW-3066557] read SQL files as UTF-8 regardless of system locale#2982sfc-gh-olorek wants to merge 1 commit into
sfc-gh-olorek wants to merge 1 commit into
Conversation
`snow sql -f <file>` and the `!source <file>` directive relied on Python's default text encoding when opening SQL files, which on Japanese Windows resolves to cp932. Any UTF-8 file containing a non-ASCII character (including comments like `-- コメント`) would crash with UnicodeDecodeError before a single statement was sent to Snowflake. Explicitly pass `encoding="utf-8"` at both call sites (`files_reader` and `ParsedStatement.from_file`) so the reader always decodes UTF-8, matching the encoding the CLI already uses when writing files. Adds two regression tests that simulate a non-UTF-8 default encoding by monkeypatching `pathlib.Path.open` and verify both the `files_reader` (top-level `-f` path) and the `!source` include path read the file successfully. Fixes #2759
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pre-review checklist
Changes description
Fixes #2759 / SNOW-3066557.
snow sql -f <file>(and the!source <file>include directive) opened SQL files without specifying an encoding, so Python fell back to the platform default text encoding. On Japanese Windows that resolves tocp932: any UTF-8 SQL file containing a non-ASCII character — even a single-- コメントcomment — crashes the command withUnicodeDecodeErrorbefore the first statement ever reaches Snowflake.This change passes
encoding=""utf-8""explicitly at both call sites insrc/snowflake/cli/_plugins/sql/statement_reader.py:files_reader, the top-level entry point forsnow sql -fParsedStatement.from_file, the loader used for!source <file>includesThe CLI already writes files as UTF-8 elsewhere (
SecurePath.write_text), so this aligns the read path with the write path and removes the dependency on the platform locale.Tests
Two regression tests were added to
tests/sql/test_statement_reader.py:test_read_utf8_file_on_non_utf8_localetest_source_utf8_file_on_non_utf8_localeBoth write Japanese UTF-8 bytes to a
.sqlfile, monkeypatchpathlib.Path.opento default tocp932(faithfully simulating the customer environment — just monkeypatchinglocale.getpreferredencodingis not enough on modern Python), and assert the reader succeeds. Verified that both tests fail onmainand pass with this fix. Fulltests/sql/test_statement_reader.pysuite: 52/52 passing.