fix(dynamic): handle PID/TID reuse in address identity by Vib3ss · Pull Request #2933 · mandiant/capa

Vib3ss · 2026-03-16T22:35:45Z

Implemented a fix for #2619 by preventing PID/TID reuse collisions in dynamic analysis.
Process/thread identities now support optional lifecycle IDs, and dynamic layout aggregation no longer drops matched calls when a thread address is reused.

The change is backward-compatible across freeze/proto/render address handling and includes new regression tests (6 passed) plus related dynamic/proto/render tests (50 passed).

No CHANGELOG update needed

Closes #2619

google-cla · 2026-03-16T22:35:56Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2026-03-16T22:36:11Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of dynamic analysis by addressing the challenge of PID/TID reuse. It introduces optional lifecycle identifiers to process and thread addresses, ensuring that each instance is uniquely identifiable throughout its existence. This enhancement prevents data loss and aggregation issues that previously occurred when the operating system recycled process or thread identifiers. The changes are integrated across the address definition, data extraction, serialization, and rendering components, maintaining backward compatibility and including new tests to verify the fix.

Highlights

Enhanced Address Identity: Introduced an optional id field to ProcessAddress and ThreadAddress classes to uniquely identify processes and threads, even when PIDs or TIDs are reused. This ensures distinct identities across their lifecycles.
Updated Address Comparison and Hashing: Modified the __hash__, __eq__, and __lt__ methods for ProcessAddress and ThreadAddress to incorporate the new id field, guaranteeing correct identity and ordering when ids are present.
Improved Dynamic Analysis Extraction: Updated CAPE and VMRay extractors to assign unique lifecycle ids to processes and threads, preventing collisions during dynamic analysis when PIDs/TIDs are recycled by the operating system.
Backward-Compatible Serialization/Deserialization: Adjusted the freeze and proto modules to handle the new optional id fields in addresses, ensuring backward compatibility for existing serialized data while supporting the enhanced address structures.
Fix for Dynamic Layout Aggregation: Resolved an issue in capa/loader.py where matched calls were incorrectly dropped during dynamic layout aggregation when a thread address was reused, ensuring all relevant calls are preserved.
New Regression Tests: Added comprehensive regression tests to validate the correct behavior of address identity with lifecycle ids, round-trip serialization, and the fix for recycled TIDs in dynamic layout aggregation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

capa/features/address.py
- Added Optional type hint.
- Introduced an optional id field to ProcessAddress and ThreadAddress constructors.
- Updated __repr__, __hash__, __eq__, and __lt__ methods to include the id for identity.
capa/features/extractors/cape/file.py
- Modified process extraction to assign unique ids to ProcessAddress instances when PID/PPID reuse is detected, preventing collisions.
capa/features/extractors/cape/process.py
- Updated thread extraction to assign unique ids to ThreadAddress instances when TID reuse is detected.
capa/features/extractors/vmray/extractor.py
- Integrated monitor_id into ProcessAddress and ThreadAddress creation for VMRay extractions to ensure unique identification.
capa/features/freeze/init.py
- Extended from_capa and to_capa methods to handle the new optional id fields in ProcessAddress, ThreadAddress, and DynamicCallAddress for serialization and deserialization.
capa/loader.py
- Corrected the dynamic layout aggregation logic to properly handle thread address reuse by using setdefault and checking matched_threads to prevent dropping matched calls.
capa/render/proto/init.py
- Updated addr_to_pb2 function to parse and serialize ProcessAddress, ThreadAddress, and DynamicCallAddress with their new optional id fields for protobuf rendering.
capa/render/verbose.py
- Enhanced format_address, render_process, render_thread, and render_span_of_calls functions to display the optional id fields in verbose output.
tests/test_address_uniqueness.py
- Added a new test file containing regression tests to verify the correct behavior of ProcessAddress and ThreadAddress identity with ids, freeze/thaw roundtrips, and the fix for recycled TIDs in dynamic layout aggregation.

Activity

The pull request implements a fix for issue dynamic: render: ValueError "name not found for call" #2619, addressing PID/TID reuse collisions.
New regression tests (6 passed) were added to validate the changes.
Related dynamic, proto, and render tests (50 passed) were executed to ensure compatibility and correctness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request effectively addresses PID/TID reuse by introducing optional lifecycle IDs to ProcessAddress and ThreadAddress. The changes are well-integrated across feature extraction, serialization (freeze), and rendering, ensuring backward compatibility. The addition of regression tests is also a great inclusion. I have a couple of suggestions to improve consistency in logging and output formatting.

gemini-code-assist · 2026-03-16T22:39:13Z

capa/features/extractors/cape/process.py

+    seq: dict[int, int] = {}
+    for tid in threads:
+        seq[tid] = seq.get(tid, 0) + 1
+        thread_id = seq[tid] - 1 if counts[tid] > 1 else None
+
+        address: ThreadAddress = ThreadAddress(process=ph.address, tid=tid, id=thread_id)
        yield ThreadHandle(address=address, inner={})


This function handles TID reuse by assigning a lifecycle ID, which is great. However, unlike the corresponding get_processes function which warns about PID/PPID reuse, there's no warning here for TID reuse. For consistency and better diagnostics, consider adding a warning when a TID is reused within a process.

seq: dict[int, int] = {} warned_tids: set[int] = set() for tid in threads: if counts[tid] > 1 and tid not in warned_tids: logger.warning("TID reuse detected for tid %d in process %s", tid, ph.address) warned_tids.add(tid) seq[tid] = seq.get(tid, 0) + 1 thread_id = seq[tid] - 1 if counts[tid] > 1 else None address: ThreadAddress = ThreadAddress(process=ph.address, tid=tid, id=thread_id) yield ThreadHandle(address=address, inner={})

gemini-code-assist · 2026-03-16T22:39:14Z

capa/render/verbose.py

+        if len(address.value) == 2:
+            ppid, pid = address.value
+            assert isinstance(ppid, int)
+            assert isinstance(pid, int)
+            return f"process{{pid:{pid}}}"
+        ppid, pid, process_id = address.value
        assert isinstance(ppid, int)
        assert isinstance(pid, int)
+        assert isinstance(process_id, int)
+        if process_id >= 0:
+            return f"process{{pid:{pid},id:{process_id}}}"
        return f"process{{pid:{pid}}}"
    elif address.type == frz.AddressType.THREAD:
        assert isinstance(address.value, tuple)
-        ppid, pid, tid = address.value
+        if len(address.value) == 3:
+            ppid, pid, tid = address.value
+            assert isinstance(ppid, int)
+            assert isinstance(pid, int)
+            assert isinstance(tid, int)
+            return f"process{{pid:{pid},tid:{tid}}}"
+        ppid, pid, tid, process_id, thread_id = address.value
        assert isinstance(ppid, int)
        assert isinstance(pid, int)
        assert isinstance(tid, int)
-        return f"process{{pid:{pid},tid:{tid}}}"
+        assert isinstance(process_id, int)
+        assert isinstance(thread_id, int)
+        s = f"process{{pid:{pid},tid:{tid}"
+        if process_id >= 0:
+            s += f",pid_id:{process_id}"
+        if thread_id >= 0:
+            s += f",id:{thread_id}"
+        return s + "}"
    elif address.type == frz.AddressType.CALL:
        assert isinstance(address.value, tuple)
-        ppid, pid, tid, id_ = address.value
-        return f"process{{pid:{pid},tid:{tid},call:{id_}}}"
+        if len(address.value) == 4:
+            ppid, pid, tid, id_ = address.value
+            return f"process{{pid:{pid},tid:{tid},call:{id_}}}"
+        ppid, pid, tid, id_, process_id, thread_id = address.value
+        s = f"process{{pid:{pid},tid:{tid}"
+        if process_id >= 0:
+            s += f",pid_id:{process_id}"
+        if thread_id >= 0:
+            s += f",id:{thread_id}"
+        return s + f",call:{id_}}}"


The keys used for formatting process and thread lifecycle IDs are inconsistent and can be ambiguous.

For PROCESS addresses, the process lifecycle ID is keyed as id.

For THREAD and CALL addresses, the process lifecycle ID is keyed as pid_id, while the thread lifecycle ID is keyed as id.

This is inconsistent (id vs pid_id for the same concept) and ambiguous (id can mean process or thread lifecycle ID depending on context).

To improve clarity and consistency, I suggest using p_id for the process lifecycle ID and t_id for the thread lifecycle ID across all address types.

if len(address.value) == 2: ppid, pid = address.value assert isinstance(ppid, int) assert isinstance(pid, int) return f"process{{pid:{pid}}}" ppid, pid, process_id = address.value assert isinstance(ppid, int) assert isinstance(pid, int) assert isinstance(process_id, int) if process_id >= 0: return f"process{{pid:{pid},p_id:{process_id}}}" return f"process{{pid:{pid}}}" elif address.type == frz.AddressType.THREAD: assert isinstance(address.value, tuple) if len(address.value) == 3: ppid, pid, tid = address.value assert isinstance(ppid, int) assert isinstance(pid, int) assert isinstance(tid, int) return f"process{{pid:{pid},tid:{tid}}}" ppid, pid, tid, process_id, thread_id = address.value assert isinstance(ppid, int) assert isinstance(pid, int) assert isinstance(tid, int) assert isinstance(process_id, int) assert isinstance(thread_id, int) s = f"process{{pid:{pid},tid:{tid}" if process_id >= 0: s += f",p_id:{process_id}" if thread_id >= 0: s += f",t_id:{thread_id}" return s + "}" elif address.type == frz.AddressType.CALL: assert isinstance(address.value, tuple) if len(address.value) == 4: ppid, pid, tid, id_ = address.value return f"process{{pid:{pid},tid:{tid},call:{id_}}}" ppid, pid, tid, id_, process_id, thread_id = address.value s = f"process{{pid:{pid},tid:{tid}" if process_id >= 0: s += f",p_id:{process_id}" if thread_id >= 0: s += f",t_id:{thread_id}" return s + f",call:{id_}}}"

CHANGELOG updated or no update needed, thanks! 😄

mike-hunhoff · 2026-03-19T15:43:24Z

dup of #2882

fix(dynamic): handle PID/TID reuse in address identity

32f67ee

github-actions bot previously requested changes Mar 16, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

chore: align formatting and add TID reuse warning in CAPE threads

c824c2a

Merge branch 'master' into issue-2619-dynamic-address-identity

42017bc

mike-hunhoff closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dynamic): handle PID/TID reuse in address identity#2933

fix(dynamic): handle PID/TID reuse in address identity#2933
Vib3ss wants to merge 3 commits intomandiant:masterfrom
Vib3ss:issue-2619-dynamic-address-identity

Vib3ss commented Mar 16, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Mar 16, 2026

Uh oh!

gemini-code-assist bot commented Mar 16, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 16, 2026

Uh oh!

gemini-code-assist bot Mar 16, 2026

Uh oh!

mike-hunhoff commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Vib3ss commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Mar 16, 2026

Uh oh!

gemini-code-assist bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

mike-hunhoff commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vib3ss commented Mar 16, 2026 •

edited

Loading