Skip to content

fix: resolve intermittent NullPointerException in LetExprNode during concurrent execution#1484

Open
thiago-lopes-dev wants to merge 1 commit intoapple:mainfrom
thiago-lopes-dev:patch-1
Open

fix: resolve intermittent NullPointerException in LetExprNode during concurrent execution#1484
thiago-lopes-dev wants to merge 1 commit intoapple:mainfrom
thiago-lopes-dev:patch-1

Conversation

@thiago-lopes-dev
Copy link
Copy Markdown

This PR fixes an intermittent NullPointerException occurring in LetExprNode.java when running Gradle tasks with high parallelism (e.g., :load:configClassesGatherImports).

The root cause was a race condition in the lazy initialization of Truffle nodes. Specifically, the functionNode was being assigned before the callNode was fully initialized and inserted into the AST. In a multi-threaded environment, a second thread could pass the functionNode == null check while callNode was still null, leading to a crash during the .call() invocation.

Changes
Thread-safe Initialization: Added a synchronized block to the initialization logic in LetExprNode.

Atomic Assignment Pattern: Refactored the code to use callNode as the primary initialization flag.

Guaranteed Visibility: Ensured that functionNode and customThisSlot are fully assigned before callNode is inserted, preventing other threads from accessing a partially initialized state.

Fixes #1470

Refactor initialization logic for callNode and functionNode to ensure thread safety and proper resolution.
@odenix
Copy link
Copy Markdown
Contributor

odenix commented Mar 31, 2026

This doesn’t look right to me. Pkl’s Truffle nodes aren’t meant to be executed concurrently.

I’m not sure, but perhaps the correct fix is to ensure that stdlib modules/classes shared between evaluators are implemented entirely in thread-safe Java code and don’t contain any Pkl code. I believe the only stdlib module that must be shared is pkl.base.

As a side note, LetExprNode could be made more efficient by swapping frames rather than hoisting its body into a FunctionNode. This would also allow merging immediately nested let-expressions into a single node/frame. I’ve successfully used frame swapping in past PRs and, at one point, had a nearly complete PR to reimplement LetExprNode accordingly.

@bioball
Copy link
Copy Markdown
Member

bioball commented Apr 7, 2026

Agree, we don't want to add any locks within a truffle node.

As a side note, LetExprNode could be made more efficient by swapping frames rather than hoisting its body into a FunctionNode. This would also allow merging immediately nested let-expressions into a single node/frame. I’ve successfully used frame swapping in past PRs and, at one point, had a nearly complete PR to reimplement LetExprNode accordingly.

Sounds pretty interesting; what do you mean by frame swapping? What frame would we swap this with?

I've been wanting to explore this at some point too; another idea I had is to rewrite the let binding name at parse time; should be doable once we've implemented #1154. With that, we don't need to introduce a new scope.

@odenix
Copy link
Copy Markdown
Contributor

odenix commented Apr 7, 2026

Sounds pretty interesting; what do you mean by frame swapping? What frame would we swap this with?

I mean creating a new frame and calling the let-body with that frame, without going through a root node. I already use this technique in GeneratorForNode#executeIteration. I also had it working for let-expressions, aside from some parsing and stack trace quirks when merging immediately nested let-expressions, which is a tempting but optional optimization.

another idea I had is to rewrite the let binding name at parse time; should be doable once we've implemented #1154. With that, we don't need to introduce a new scope

Sounds good, but might be tricky to get right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NullPointerException: Cannot invoke "com.oracle.truffle.api.nodes.DirectCallNode.call(Object[])" because "this.callNode" is null

3 participants