fix(parser): recover from unexpected tokens in expression position by ghaith · Pull Request #1709 · PLC-lang/rusty

ghaith · 2026-04-30T14:53:19Z

Summary

parse_atomic_leaf_expression's fallback emitted an E007 unexpected_token_found diagnostic but did not advance the lexer, leaving the bad token in the stream for binary-op parsers higher in the cascade to re-consume. For prog(p := &g) from #1306 this meant the AST became BinaryExpression { And, EmptyStatement, g }; the empty LHS reached codegen and produced a confusing no type hint available for EmptyStatement while compilation reported success. The same shape bit any other binary-only operator misused as a unary prefix (MOD x, OR x, …).

Recovery now:

emits a clearer "expected expression" diagnostic (was "Literal");
when the bad token is itself an operator, advances past it and retries the leaf so the operand the user wrote is captured (&g → g);
otherwise leaves the token in place so outer parsers can use it for synchronization (e.g. END_CASE, END_VAR).

The missing-parameter-assignment carve-out (foo(p := )) is narrowed to fire only when the next token actually closes the parameter list () or ,); previously any unrecognised token after := was silently treated as an empty parameter, masking real errors like the one in #1306.

Resolver tests that exercised pointer arithmetic via the dead-code &x path are migrated to REF(x).

Fixes #1306

Behaviour change for users

Before:

$ plc bug.st --check
$ echo $?
0

After:

$ plc bug.st --check
error[E007]: Unexpected token: expected expression but found &
   ┌─ bug.st:13:14
   │
13 │     foo(p := &g);
   │              ^ Unexpected token: expected expression but found &

Compilation aborted due to critical parse errors.
$ echo $?
1

Test plan

New inline-snapshot tests in expressions_parser_tests.rs covering prog(p := &g) (the Using the removed & - operator in a formal parameter assignment will cause an error during codegen #1306 regression), bare &y, and MOD y (generality witness).
Defensive amp_as_and_test (binary b & c) stays green.
16 existing parser-error snapshots updated to reflect the "Literal" → "expression" wording. Locations and column markers are unchanged across all of them.
qualified_reference_location_test no longer covers &aaa.bbb.ccc since that case was anchoring the buggy EmptyStatement AND expr shape.
Two resolver tests migrated from &x to REF(x) (the & path was dead code that fell through the buggy parser recovery).
cargo test --workspace clean (2466/2466).
cargo xtask lit clean (347/347).
cargo fmt --all + cargo clippy --workspace -- -Dwarnings clean.
Manual repro: plc <file> --check on the issue's input now exits 1 with E007; full compile aborts before codegen with the same diagnostic.

🤖 Generated with Claude Code

github-actions · 2026-04-30T14:59:31Z

Build Artifacts

🐧 Linux

Artifact	Link	Size
`deb-x86_64`	Download	38.5 MB
`schema`	Download	0.0 MB
`stdlib`	Download	32.4 MB
`plc-x86_64`	Download	43.7 MB
`deb-aarch64`	Download	31.0 MB
`plc-aarch64`	Download	43.6 MB

From workflow run

🪟 Windows

Artifact	Link	Size
`stdlib.lib`	Download	4.3 MB
`stdlib.dll`	Download	0.1 MB
`plc.exe`	Download	38.5 MB

From workflow run

mhasel

Note: this review was generated by Claude (via Claude Code) and posted on my behalf. Treat the suggestions as machine-generated input — verify before acting. — Michael

Overview

Fixes #1306: parse_atomic_leaf_expression's fallback diagnosed E007 but didn't advance the lexer, so binary-op parsers higher in the cascade re-consumed the bad token (e.g. & re-parsed as binary-AND with EmptyStatement LHS). The malformed AST then flowed into codegen, producing a confusing EmptyStatement type-hint error while compilation reported success. The fix advances past binary-only operators used as prefixes and narrows the foo(p := ) empty-parameter carve-out to only fire when the next token is actually ) or ,.

What I like

The diagnosis is sharp. The PR body correctly identifies why the AST was malformed: silent token re-consumption, plus an over-broad empty-parameter carve-out. Both root causes are addressed.
The carve-out tightening is the cleaner of the two changes. Adding && matches!(lexer.token, KeywordParensClose | KeywordComma) is a one-line invariant that prevents a recovery path from masking real errors. This is a strict improvement.
Diagnostic wording. "Literal" → "expression" is more accurate. The 16 mechanical snapshot updates all preserve location/column markers and only change the wording, which is exactly the right blast radius.
Test coverage. Three new tests pin: the #1306 regression (&g in a call arg), bare &y in an assignment, and MOD y (generality witness — proves the fix isn't &-specific). The retained amp_as_and_test confirms b & c still parses correctly. The combination is convincing.
Resolver-test migration to REF(x) is the right call given that &x was never first-class syntax — only working via the buggy path. The migrated tests still exercise the underlying pointer-arithmetic resolver logic.

Suggestions

Recursion after advance. return parse_atomic_leaf_expression(lexer) is a tail call that recurses for each consecutive bad operator. Pathological input like & & & g would emit one diagnostic per & and recurse N times. The lexer is bounded so this won't OOM, but converting to a loop would (a) avoid burning stack on adversarial input and (b) make the bound visible:
```
while to_operator(&lexer.token).is_some() {
    lexer.advance();
}
// then fall through to a single retry
```
Optional — current form is fine in practice.
What counts as "binary-only" is implicit. to_operator(&lexer.token).is_some() returns true for unary-eligible tokens too (-, +, NOT). Those shouldn't reach this fallback because the prefix-operator parser handles them earlier — but the comment frames this as recovery for "binary-only operators misused as a prefix," which doesn't quite match what the predicate actually does. A one-liner clarifying "any token that has an operator interpretation; unary-eligible operators normally don't reach here" would close the gap.
A negative test for the narrowed carve-out would help. The fix tightens foo(p := ) recovery to only fire for ) or ,. The new tests cover the case where the carve-out should not fire (foo(p := &g) now produces E007 instead of silent empty-param), but there's no positive test that the carve-out still works for the original foo(p := ) shape. If one already exists, no action; otherwise consider adding one to anchor that the legitimate recovery path is preserved.
Behavior-change call-out. The PR body lists the user-visible change (plc --check now exits 1 on previously-silent input). This belongs in release notes too — projects with broken-but-compiling code will see a new build failure, and they should know it's the parser becoming honest, not a regression.

Risks

Loud failure for previously-silent code. Any project that had &x (or other binary-only-as-prefix) buried in code now gets E007 and fails compilation. Strictly an improvement, but worth a heads-up in release notes.
The resolver-test migration is the most fragile part. Two tests changed semantics from "parse &x and resolve through the buggy AST shape" to "parse REF(x) cleanly." The behavior being asserted (pointer arithmetic resolves to a pointer) is preserved, but if anyone has external lit/integration tests that relied on &x parsing, they'll break — please grep for & followed by an identifier in tests/lit/ and tests/integration/ to confirm no other coverage was leaning on the dead path.
to_operator coverage. If to_operator returns None for any token that should be recovered (e.g., a punctuation-only token that intuitively reads as binary), the recovery won't fire and we fall back to leaving the token in the stream. That's the conservative direction (no false advance), so I don't think it's a real risk, just worth noting.

Verdict

LGTM. Well-scoped fix, good test coverage, appropriate snapshot churn, and the carve-out tightening is genuinely valuable on its own. The recursion-vs-loop point is a nit; the comment-clarification and release-notes points are small. Approve once the lit-test grep confirms no external &x usage.

…-out test Review followups on PR #1709: - Replace the tail-recursing recovery with a loop drain. On adversarial input like `& & & g` the previous form emitted one diagnostic per `&` and recursed N times; the loop drains the operator run and retries once, producing exactly one diagnostic. Guards against infinite retry on non-operator bad tokens via an `advanced` flag — the reviewer's bare-while suggestion would have looped forever on inputs like `END_VAR` directly. - Sharpen the recovery comment: `to_operator()` also matches unary-eligible operators (`-`, `+`, `NOT`), but those don't reach this fallback because the prefix-operator parser handles them earlier in the cascade. - Add `consecutive_bad_operators_emit_a_single_diagnostic` to lock the N→1 reduction. - Add `empty_parameter_assignment_carve_out_still_fires` to anchor the positive complement of the narrowed `foo(p := )` carve-out. Verified during followup that no `.st` file under tests/lit, tests/integration, or libs relied on the previously-tolerated `&<ident>` shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…1306) `parse_atomic_leaf_expression`'s fallback emitted an `E007` diagnostic but left the bad token in the stream, so binary-op parsers higher in the cascade re-consumed it (e.g. `&` taken as binary AND with an `EmptyStatement` LHS). The malformed AST flowed into codegen, producing a confusing "no type hint available for EmptyStatement" while compilation reported success. `prog(p := &g)` from #1306 was the salient case; any binary-only operator misused as a unary prefix had the same shape. Recovery now: - emits a clearer "expected expression" diagnostic (was "Literal"); - when the bad token is itself an operator, advances past it and retries the leaf so the operand the user wrote is captured (`&g` -> `g`); - otherwise leaves the token in place so outer parsers can use it for synchronization (e.g. `END_CASE`, `END_VAR`). The missing-parameter-assignment carve-out (`foo(p := )`) is narrowed to fire only when the next token is actually `)` or `,`; previously any unrecognised token after `:=` was silently treated as an empty parameter, masking real errors. Resolver tests that exercised pointer arithmetic via the dead-code `&x` path are migrated to `REF(x)`. Fixes #1306 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-out test Review followups on PR #1709: - Replace the tail-recursing recovery with a loop drain. On adversarial input like `& & & g` the previous form emitted one diagnostic per `&` and recursed N times; the loop drains the operator run and retries once, producing exactly one diagnostic. Guards against infinite retry on non-operator bad tokens via an `advanced` flag — the reviewer's bare-while suggestion would have looped forever on inputs like `END_VAR` directly. - Sharpen the recovery comment: `to_operator()` also matches unary-eligible operators (`-`, `+`, `NOT`), but those don't reach this fallback because the prefix-operator parser handles them earlier in the cascade. - Add `consecutive_bad_operators_emit_a_single_diagnostic` to lock the N→1 reduction. - Add `empty_parameter_assignment_carve_out_still_fires` to anchor the positive complement of the narrowed `foo(p := )` carve-out. Verified during followup that no `.st` file under tests/lit, tests/integration, or libs relied on the previously-tolerated `&<ident>` shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…test Spell out that `p := )` and `p := ,` are explicitly allowed by the parser (land as `Assignment { right: EmptyStatement }`, no diagnostic) and that the test only locks in the carve-out — i.e. the new `parse_atomic_leaf_ expression` recovery must not consume the trailing `)` or `,`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1306

mhasel reviewed May 4, 2026

View reviewed changes

volsa reviewed May 5, 2026

View reviewed changes

Comment thread src/parser/tests/expressions_parser_tests.rs

ghaith and others added 3 commits May 6, 2026 08:59

ghaith force-pushed the fix/parser-recovery-1306 branch from 0bba529 to 4b3fd5c Compare May 6, 2026 07:13

ghaith added 2 commits May 11, 2026 16:19

Merge remote-tracking branch 'origin/master' into fix/parser-recovery…

67214dd

…-1306

Merge branch 'master' into fix/parser-recovery-1306

7b34ab0

volsa approved these changes May 28, 2026

View reviewed changes

ghaith added this pull request to the merge queue May 28, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 28, 2026

ghaith added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

ghaith added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

Merge branch 'master' into fix/parser-recovery-1306

f559ddb

ghaith enabled auto-merge May 29, 2026 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parser): recover from unexpected tokens in expression position#1709

fix(parser): recover from unexpected tokens in expression position#1709
ghaith wants to merge 6 commits into
masterfrom
fix/parser-recovery-1306

ghaith commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

mhasel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ghaith commented Apr 30, 2026

Summary

Behaviour change for users

Test plan

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Artifacts

🐧 Linux

🪟 Windows

Uh oh!

mhasel left a comment

Choose a reason for hiding this comment

Overview

What I like

Suggestions

Risks

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 30, 2026 •

edited

Loading