Skip to content

fix: [SNOW-3066557] read SQL files as UTF-8 regardless of system locale#2982

Draft
sfc-gh-olorek wants to merge 1 commit into
mainfrom
proactive/SNOW-3066557-utf8-sql-read
Draft

fix: [SNOW-3066557] read SQL files as UTF-8 regardless of system locale#2982
sfc-gh-olorek wants to merge 1 commit into
mainfrom
proactive/SNOW-3066557-utf8-sql-read

Conversation

@sfc-gh-olorek
Copy link
Copy Markdown
Contributor

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually on MacOS.
  • I've confirmed that my changes are working by executing CLI's commands manually on Windows.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.
  • I've described my changes in the documentation.

Changes description

Fixes #2759 / SNOW-3066557.

snow sql -f <file> (and the !source <file> include directive) opened SQL files without specifying an encoding, so Python fell back to the platform default text encoding. On Japanese Windows that resolves to cp932: any UTF-8 SQL file containing a non-ASCII character — even a single -- コメント comment — crashes the command with UnicodeDecodeError before the first statement ever reaches Snowflake.

This change passes encoding=""utf-8"" explicitly at both call sites in src/snowflake/cli/_plugins/sql/statement_reader.py:

  • files_reader, the top-level entry point for snow sql -f
  • ParsedStatement.from_file, the loader used for !source <file> includes

The CLI already writes files as UTF-8 elsewhere (SecurePath.write_text), so this aligns the read path with the write path and removes the dependency on the platform locale.

Tests

Two regression tests were added to tests/sql/test_statement_reader.py:

  • test_read_utf8_file_on_non_utf8_locale
  • test_source_utf8_file_on_non_utf8_locale

Both write Japanese UTF-8 bytes to a .sql file, monkeypatch pathlib.Path.open to default to cp932 (faithfully simulating the customer environment — just monkeypatching locale.getpreferredencoding is not enough on modern Python), and assert the reader succeeds. Verified that both tests fail on main and pass with this fix. Full tests/sql/test_statement_reader.py suite: 52/52 passing.

`snow sql -f <file>` and the `!source <file>` directive relied on
Python's default text encoding when opening SQL files, which on
Japanese Windows resolves to cp932.  Any UTF-8 file containing a
non-ASCII character (including comments like `-- コメント`) would
crash with UnicodeDecodeError before a single statement was sent to
Snowflake.

Explicitly pass `encoding="utf-8"` at both call sites
(`files_reader` and `ParsedStatement.from_file`) so the reader
always decodes UTF-8, matching the encoding the CLI already uses
when writing files.

Adds two regression tests that simulate a non-UTF-8 default encoding
by monkeypatching `pathlib.Path.open` and verify both the
`files_reader` (top-level `-f` path) and the `!source` include path
read the file successfully.

Fixes #2759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-3066557: UnicodeDecodeError when executing SQL files with UTF-8 encoding on Japanese Windows

1 participant