Skip to content

air: mangle invalid SMT atoms (fixes #2221)#2312

Open
sunmy2019 wants to merge 2 commits into
verus-lang:mainfrom
sunmy2019:gh-2221-1
Open

air: mangle invalid SMT atoms (fixes #2221)#2312
sunmy2019 wants to merge 2 commits into
verus-lang:mainfrom
sunmy2019:gh-2221-1

Conversation

@sunmy2019
Copy link
Copy Markdown
Contributor

@sunmy2019 sunmy2019 commented Apr 8, 2026

resolves #2221

Use a dedicated AIR-side mangling scheme for invalid SMT atoms rather than emitting quoted SMT symbols. This keeps the output in plain-symbol form, which is a better fit for backend compatibility and long-term stability across solvers and versions.

Also add:

  • a regression test covering raw identifiers that reach SMT printing
  • a debug assertion in the printer to catch invalid SMT atoms in Node::Atom before they are emitted

Assisted by: Claude 4.6 Sonnet
Assisted by: Kimi K2.6-code-preview

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Use a dedicated AIR-side mangling scheme for invalid SMT atoms rather than
emitting quoted SMT symbols. This keeps the output in plain-symbol form, which
is a better fit for backend compatibility and long-term stability across
solvers and versions.

Also add:
- a regression test covering raw identifiers that reach SMT printing
- a debug assertion in the printer to catch invalid SMT atoms in `Node::Atom`
	before they are emitted
@Chris-Hawblitzel
Copy link
Copy Markdown
Collaborator

AIR should follow the same rules for identifiers as SMTLIB format, so the VIR-to-AIR translation should fix any non-SMTLIB identifiers, rather than the AIR printer trying to fix this.

@Chris-Hawblitzel Chris-Hawblitzel self-requested a review April 8, 2026 15:23
@sunmy2019
Copy link
Copy Markdown
Contributor Author

AIR should follow the same rules for identifiers as SMTLIB format, so the VIR-to-AIR translation should fix any non-SMTLIB identifiers, rather than the AIR printer trying to fix this.

I will open another PR in 1 or 2 days using this idea. Let's compare at that time.

@tjhance
Copy link
Copy Markdown
Collaborator

tjhance commented Apr 8, 2026

I think this issue is complex enough that it needs some design discussion. Between Rust identifiers, VIR identifiers, AIR identifiers, Z3 identifiers, the various disambiguators and encoding schemes, there are a lot of facets to consider. I attempted to start some discussion at this old issue: #1778

AIR symbols may now contain Unicode and other non-SMT characters (e.g.
raw identifiers like `r#match`, Greek letters).  Instead of mangling at
node-creation time in `str_to_node`, mangling is now applied lazily in
`NodeWriter::write_atom` when actually emitting expressions to the
solver.

- `is_symbol` in `parser.rs` is relaxed to accept any non-empty string;
  the printer comment explains the mangling contract.
- `str_to_node` wraps its argument verbatim; `NodeWriter` handles
  mangling via `is_valid_smt_atom` / `mangle_smt_atom`.
- New public helper `sanitize_ident` is exposed so callers that build
  `Ident` values (e.g. `variant_ident`, `variant_field_ident_internal`
  in vir/def.rs) can pre-sanitize when needed.
- Tests are updated: `node_writer_mangles_invalid_atoms` checks mangling
  through `NodeWriter`; the debug-panic test is removed.
- Regression tests expanded: raw-identifier test now covers the full set
  of Rust strict keywords as names, parameters, locals, struct fields,
  and enum variants; new `unicode_identifiers_in_smt_2221` test covers
  non-ASCII identifiers end-to-end.
@sunmy2019
Copy link
Copy Markdown
Contributor Author

AIR should follow the same rules for identifiers as SMTLIB format, so the VIR-to-AIR translation should fix any non-SMTLIB identifiers, rather than the AIR printer trying to fix this.

I have implemented the VIR->AIR fix in #2325.
As for this PR, I postpone the mangling to the printer of AIR.
We can discuss which fix is better.

I think this issue is complex enough that it needs some design discussion. Between Rust identifiers, VIR identifiers, AIR identifiers, Z3 identifiers, the various disambiguators and encoding schemes, there are a lot of facets to consider. I attempted to start some discussion at this old issue: #1778

I checked this. I have 2 ideas.

Idea 1: Let VIR/AIR accepts all Rust identifiers, then mangle before SMT emission.
This is what this PR trying to do. It is more easy and foolproof.

Idea 2: Let AIR follow the same rules for identifiers as SMTLIB format. I did this in #2325.
This plan involves many changes, making it error-prone and easy to overlook things when making future changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic when using raw identifiers

3 participants