Skip to content

vir: sanitize non-SMT identifiers at AIR ident construction time (fix #2221)#2325

Open
sunmy2019 wants to merge 1 commit into
verus-lang:mainfrom
sunmy2019:gh-2221-2
Open

vir: sanitize non-SMT identifiers at AIR ident construction time (fix #2221)#2325
sunmy2019 wants to merge 1 commit into
verus-lang:mainfrom
sunmy2019:gh-2221-2

Conversation

@sunmy2019
Copy link
Copy Markdown
Contributor

One line version:
This PR tries to mangle invalid identifier during VIR -> AIR translation.


Rust identifiers can contain characters that are invalid in SMT-LIB simple symbols (raw-identifier prefix #, Unicode code points, etc.). This change adds a centralized sanitize_to_air_ident helper in vir/src/def.rs that encodes any such character as $<hex>$ and prefixes the result with $V$, leaving already-valid identifiers unchanged.

The helper is applied at every site that constructs an AIR Ident from a path or function name:

  • fun_to_air_ident / path_to_air_ident in sst_to_air.rs
  • prefix_dcr_id, prefix_type_id, prefix_dyn_id, prefix_fndef_type_id, prefix_box, prefix_unbox, projection, trait_bound, to_dyn, static_name, variant_ident, variant_field_ident_internal, unique_var_name, new_user_qid_name, new_internal_qid in def.rs

NodeWriter::write_atom in printer.rs now asserts that every atom it emits already satisfies sise::check_atom, turning any missed call site into an immediate panic rather than a silent malformed query.

Regression tests cover the full set of Rust strict keywords as raw identifiers (r#match, r#for, …) and Unicode identifiers in types, fields, enums, traits, generics, functions, parameters, locals, constants, and spec/proof functions. #![allow(mixed_script_confusables)] is added to suppress the unrelated Rust lint in the test file.

Assisted by: Kimi K2.6-code-preview
Assisted by: Claude 4.6 Sonnet

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Rust identifiers can contain characters that are invalid in SMT-LIB simple
symbols (raw-identifier prefix `#`, Unicode code points, etc.).  This change
adds a centralized `sanitize_to_air_ident` helper in `vir/src/def.rs` that
encodes any such character as `$<hex>$` and prefixes the result with `$V$`,
leaving already-valid identifiers unchanged.

The helper is applied at every site that constructs an AIR `Ident` from a
path or function name:
- `fun_to_air_ident` / `path_to_air_ident` in `sst_to_air.rs`
- `prefix_dcr_id`, `prefix_type_id`, `prefix_dyn_id`, `prefix_fndef_type_id`,
  `prefix_box`, `prefix_unbox`, `projection`, `trait_bound`, `to_dyn`,
  `static_name`, `variant_ident`, `variant_field_ident_internal`,
  `unique_var_name`, `new_user_qid_name`, `new_internal_qid` in `def.rs`

`NodeWriter::write_atom` in `printer.rs` now asserts that every atom it
emits already satisfies `sise::check_atom`, turning any missed call site
into an immediate panic rather than a silent malformed query.

Regression tests cover the full set of Rust strict keywords as raw
identifiers (`r#match`, `r#for`, …) and Unicode identifiers in types,
fields, enums, traits, generics, functions, parameters, locals, constants,
and spec/proof functions.  `#![allow(mixed_script_confusables)]` is added to
suppress the unrelated Rust lint in the test file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant