Skip to content

Replace stack-unwinding error handling with error-flag returns#5002

Open
SeanTAllen wants to merge 1 commit into
mainfrom
sean/error-flag-returns
Open

Replace stack-unwinding error handling with error-flag returns#5002
SeanTAllen wants to merge 1 commit into
mainfrom
sean/error-flag-returns

Conversation

@SeanTAllen
Copy link
Copy Markdown
Member

@SeanTAllen SeanTAllen commented Mar 10, 2026

Pony's error keyword propagated errors by unwinding the stack (libunwind on POSIX, SEH on Windows), which forced an LLVM invoke at every partial call site and pulled in platform-specific landing pad, personality, and LSDA machinery.

Partial functions now return an error flag instead: {T, i1} for value-returning functions, i1 for void-returning class constructors. Callers check the flag and either branch to their handler or propagate. try becomes an ordinary branch.

With Pony-to-Pony propagation no longer riding on stack unwinding, the unwinding infrastructure is dead code and is removed: pony_error(), pony_try(), ponyint_personality_v0(), LSDA scanning, except_try_catch.ll, and all landing pad / invoke / personality machinery in codegen.

Removing pony_error() has user-facing consequences:

  • Bare functions that hit error outside a try now call abort(). A bare partial lambda can no longer carry an error across a C frame. The serialise package was the only stdlib user of that pattern and has been removed along with its tests (RFC Debug symbols and LLDB integration #83). The C-level serialise runtime used internally by the compiler is untouched.
  • pony_error() was the only way a partial FFI function could signal failure, so a ? on an FFI declaration or call no longer does anything and is now a compile error from the syntax pass. The dead check_partial_ffi_call verifier logic is removed. Dropping the ? from the grammar entirely is left for a follow-up.
  • C runtime code that called pony_error() reports failure through return values now. The five PONY_API socket functions return a three-state result (PONY_SOCKET_OK/RETRY/ERROR) plus a size_t* count_out out-parameter (RFC Check identifier rules are enforced #84). ponyint_formattime returns NULL on a bad format string.

Closes #4443. Closes #5280. Closes #5325. Design: #4997.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Mar 10, 2026
@SeanTAllen SeanTAllen marked this pull request as ready for review March 10, 2026 11:06
@SeanTAllen
Copy link
Copy Markdown
Member Author

This isnt actually ready for review. I switched it to that to get CI to run.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Mar 10, 2026
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Mar 10, 2026
@SeanTAllen SeanTAllen marked this pull request as draft March 10, 2026 13:01
@SeanTAllen
Copy link
Copy Markdown
Member Author

Did an ensemble review of this (security, API, performance, correctness — four independent passes). The core codegen is correct across all paths I traced. A few things to address:

Partial FFI declarations are silently accepted. The plan called for a compiler error when users write use @foo[T](...) ? (step 3i), but the implementation just removes the err variable and invoke path from gen_ffi without adding the rejection. The ? compiles fine and does nothing — the call is always a plain LLVMBuildCall2. That means third-party code with partial FFI declarations that depended on C-side pony_error() will silently lose error propagation. The stdlib is clean (Phases 1-2 fixed it), but user code gets no signal that something changed. Needs at least a compiler error, ideally before this ships.

Brace style in genfun.c. The } else { in genfun_fun (the partial/non-partial split around the gen_assign_cast call) doesn't match the project's Allman style.

unwrap_result is fragile. Calling it on an i1 (partial class constructor return) would be LLVMBuildExtractValue on a non-aggregate type — LLVM API UB. The current callers are careful (callee_ret != c->i1 guard in gen_call), but a pony_assert inside unwrap_result itself would catch future misuse.

nullable_pointer_apply doesn't set c->frame->is_partial. It sets c_m->is_partial (correct for callers) and constructs returns manually via wrap_result/genfun_build_ret, so the missing frame flag isn't a bug today. But if someone refactors the body to use gen_error, it'll hit the pony_assert(c->frame->bare_function) and crash. Setting the frame flag after start_function would make it resilient.

Comment thread packages/net/tcp_connection.pony Outdated
@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Apr 8, 2026
@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from ba3e87a to 858af9a Compare April 30, 2026 23:23
@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 99bc832 to 860f182 Compare May 9, 2026 12:18
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label May 14, 2026
@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 860f182 to 6eb3d96 Compare May 14, 2026 13:25
@SeanTAllen SeanTAllen changed the title Experimental: Replace pony_error() with error-flag returns Replace stack-unwinding error handling with error-flag returns May 14, 2026
@SeanTAllen
Copy link
Copy Markdown
Member Author

With pony_error() gone, a ? on an FFI declaration no longer does anything. The C side has no way to raise a Pony error anymore, and codegen no longer generates a branch for one. The compiler still accepts the ?, so the declaration compiles, but it's meaningless. FFI partiality is over in practice.

This PR is itself the statement that FFI partiality goes away. The question I want to raise is whether that change also needs to go through the RFC process on its own.

I don't think it does. This is a breaking change, but it's one that falls out of an implementation change rather than a deliberate piece of language design. We didn't set out to remove partial FFI as a feature. We removed stack unwinding, and partial FFI stopped meaning anything as a consequence.

If folks want it through an RFC anyway, there's a clean way to split it. We leave it legal to write a meaningless ? on an FFI declaration, merge this PR as it stands, and send the final step, making that ? an actual compile error, through an RFC of its own.

@SeanTAllen SeanTAllen marked this pull request as ready for review May 14, 2026 13:27
@SeanTAllen SeanTAllen added the do not merge This PR should not be merged at this time label May 14, 2026
@redvers
Copy link
Copy Markdown
Contributor

redvers commented May 15, 2026

We leave it legal to write a meaningless ?

I would prefer to remove it, and here's my reasoning:

AS we are also removing pony_error from pony.h, anyone who is compiling anything that is reliant on it will be broken anyways by this PR with a linking error.

If there is a case out there with a partial C-FFI function that never had a path to pony_error, then by definition it was wrong already.

@jemc
Copy link
Copy Markdown
Member

jemc commented May 20, 2026

I agree that we should remove the silent acceptance of ? on FFI calls, because leaving it in would be misleading and arguably a "least surprise" bug.

Probably the nicest thing we could do here is continue to accept the syntax at the parser/lexer level, but catch it in some early pass and give an informative error about this change. We could leave that in place for some period of time to help people migrate.

If that seems like too much trouble, I'm okay with skipping that "nice to have" error.

@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 6eb3d96 to 32d1106 Compare May 21, 2026 01:31
@SeanTAllen
Copy link
Copy Markdown
Member Author

Note for whoever merges this: open a follow-up issue to make ? on FFI a hard syntax error — call it "Approach B."

This PR (Approach A) rejects a ? on an FFI declaration or call in the syntax pass with a friendly migration error, but deliberately leaves the grammar and AST shape untouched. The long-term resting state is a plain syntax error, and that step is its own change:

  • Drop OPT TOKEN(NULL, TK_QUESTION) from the use_ffi and ffi rules in parser.c.
  • Remove CHILD(question, none) from ffidecl and ffi_call in treecheckdef.h.
  • Trim the remaining unused child-4 (question) bindings for the 5→4 change: expr/ffi.c (expr_ffi's question, declared_ffi's call_error/decl_error) and codegen/gencall.c (gen_ffi's can_err/decl_err). These are pre-existing dead bindings left over from the pony_error() removal. (verify/call.c's check_partial_ffi_call was the only live partiality logic, and it was removed in this PR.)
  • Remove the Approach-A syntax-pass error and its tests; replace with a parser-rejection test.
  • The error message degrades to a generic syntax error, which is acceptable once the migration window has passed.

This is the AST-shape change the repo's "Known Couplings" warns about, so run the full coupled set (test-pony-compiler, test-pony-lint, test-pony-lsp, test-pony-doc) alongside the libponyc and full-program tests.

@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 5950967 to 2d3e8ee Compare May 21, 2026 11:35
@SeanTAllen
Copy link
Copy Markdown
Member Author

@jemc this has the "friendly error warning in place". its ready for review.

@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 2d3e8ee to 1b106e6 Compare May 21, 2026 11:54
@SeanTAllen SeanTAllen requested a review from jemc May 21, 2026 13:21
Pony's `error` keyword propagated errors by unwinding the stack, with
libunwind on POSIX and SEH on Windows doing the work. That forced an
LLVM `invoke` at every partial call site instead of a plain `call`,
which blocks optimization across those sites, and it pulled in a pile
of platform-specific landing pad, personality, and LSDA machinery.

Partial functions now return an error flag instead. A value-returning
partial function returns `{T, i1}`; a void-returning class constructor
returns `i1`. The flag is `false` on a normal return and `true` on an
error. Callers check it and either branch to their handler or
propagate. A `try` block becomes an ordinary branch.

With Pony-to-Pony propagation no longer riding on stack unwinding, the
unwinding infrastructure is dead code, so it goes: `pony_error()`,
`pony_try()`, `ponyint_personality_v0()`, LSDA scanning, the
`except_try_catch.ll` IR, and every landing pad, invoke, and
personality reference in codegen.

Removing `pony_error()` reaches past the compiler internals:

- Bare functions that hit `error` outside a `try` now call `abort()`.
  A bare partial lambda can no longer carry an error across a C frame,
  because the stack unwind that used to carry it is gone. The
  `serialise` package relied on that pattern, and RFC #83 already calls
  for its removal as a security footgun, so it and its tests go with
  this change. The C-level serialise runtime the compiler uses
  internally is untouched.

- `pony_error()` was also the only way a partial FFI function could
  signal failure, so a `?` on an FFI declaration or call no longer does
  anything. Codegen generates no error path for it. The compiler now
  rejects that `?` in the syntax pass with a message explaining the
  change, instead of accepting one that does nothing. The verifier's
  `check_partial_ffi_call`, which matched declaration partiality against
  call-site partiality, is dead and removed too. Dropping the `?` from
  the grammar so it becomes a plain syntax error is left for a
  follow-up.

- C runtime code that called `pony_error()` on failure needs another
  way to report it. The five `PONY_API` socket functions
  (`pony_os_writev`, `pony_os_send`, `pony_os_recv`, `pony_os_sendto`,
  `pony_os_recvfrom`) now return a three-state result
  (`PONY_SOCKET_OK`, `PONY_SOCKET_RETRY`, `PONY_SOCKET_ERROR`) and
  write the byte count to a `size_t* count_out` out-parameter. The old
  `size_t` return overloaded three meanings onto one value: bytes
  moved, would-block, and error. Each channel means one thing now. The
  C return type is `uint8_t` rather than a C `enum`, so the FFI return
  width is pinned at one byte. `ponyint_formattime` returns `NULL` on a
  bad format string, and `PosixDate.format` checks for it.

Two implementation notes. When a non-partial concrete method shares a
vtable index with a partial dispatch method (`Cons.head` and
`List.head`, say), a thin wrapper is generated that calls the concrete
method and wraps its result as `{value, false}`, so the vtable slot
has one return type. And `invoke_target` in `compile_frame_t` becomes
`error_target`, since it is now a branch target for an error-flag
check rather than an invoke unwind destination.

Design: #4997
Closes #4443
Closes #5280
Closes #5325
@SeanTAllen SeanTAllen force-pushed the sean/error-flag-returns branch from 1b106e6 to 32cfb7c Compare May 22, 2026 18:31
@redvers
Copy link
Copy Markdown
Contributor

redvers commented May 22, 2026

It occurs to me, when this merges we can probably remove this section: https://www.ponylang.io/use/performance/pony-performance-cheat-sheet/#avoid-error

@SeanTAllen
Copy link
Copy Markdown
Member Author

It occurs to me, when this merges we can probably remove this section: https://www.ponylang.io/use/performance/pony-performance-cheat-sheet/#avoid-error

yes, i am planning on doing that.

@SeanTAllen
Copy link
Copy Markdown
Member Author

I ran this against tier2 tests and they all passed.

@SeanTAllen
Copy link
Copy Markdown
Member Author

I ran this against tier3 tests and they all passed. Same for the weekly tests.

@SeanTAllen
Copy link
Copy Markdown
Member Author

My expectation is that if #3874 is still reproducible immediately prior to this commit that it won't be after this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss during sync Should be discussed during an upcoming sync do not merge This PR should not be merged at this time

Projects

None yet

4 participants