air: mangle invalid SMT atoms (fixes #2221)#2312
Conversation
Use a dedicated AIR-side mangling scheme for invalid SMT atoms rather than emitting quoted SMT symbols. This keeps the output in plain-symbol form, which is a better fit for backend compatibility and long-term stability across solvers and versions. Also add: - a regression test covering raw identifiers that reach SMT printing - a debug assertion in the printer to catch invalid SMT atoms in `Node::Atom` before they are emitted
|
AIR should follow the same rules for identifiers as SMTLIB format, so the VIR-to-AIR translation should fix any non-SMTLIB identifiers, rather than the AIR printer trying to fix this. |
I will open another PR in 1 or 2 days using this idea. Let's compare at that time. |
|
I think this issue is complex enough that it needs some design discussion. Between Rust identifiers, VIR identifiers, AIR identifiers, Z3 identifiers, the various disambiguators and encoding schemes, there are a lot of facets to consider. I attempted to start some discussion at this old issue: #1778 |
AIR symbols may now contain Unicode and other non-SMT characters (e.g. raw identifiers like `r#match`, Greek letters). Instead of mangling at node-creation time in `str_to_node`, mangling is now applied lazily in `NodeWriter::write_atom` when actually emitting expressions to the solver. - `is_symbol` in `parser.rs` is relaxed to accept any non-empty string; the printer comment explains the mangling contract. - `str_to_node` wraps its argument verbatim; `NodeWriter` handles mangling via `is_valid_smt_atom` / `mangle_smt_atom`. - New public helper `sanitize_ident` is exposed so callers that build `Ident` values (e.g. `variant_ident`, `variant_field_ident_internal` in vir/def.rs) can pre-sanitize when needed. - Tests are updated: `node_writer_mangles_invalid_atoms` checks mangling through `NodeWriter`; the debug-panic test is removed. - Regression tests expanded: raw-identifier test now covers the full set of Rust strict keywords as names, parameters, locals, struct fields, and enum variants; new `unicode_identifiers_in_smt_2221` test covers non-ASCII identifiers end-to-end.
I have implemented the VIR->AIR fix in #2325.
I checked this. I have 2 ideas. Idea 1: Let VIR/AIR accepts all Rust identifiers, then mangle before SMT emission. Idea 2: Let AIR follow the same rules for identifiers as SMTLIB format. I did this in #2325. |
resolves #2221
Use a dedicated AIR-side mangling scheme for invalid SMT atoms rather than emitting quoted SMT symbols. This keeps the output in plain-symbol form, which is a better fit for backend compatibility and long-term stability across solvers and versions.
Also add:
Node::Atombefore they are emittedAssisted by: Claude 4.6 Sonnet
Assisted by: Kimi K2.6-code-preview
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.