tests: add data-driven test fixtures for rule matcher by williballenthin · Pull Request #2987 · mandiant/capa

williballenthin · 2026-04-02T12:12:11Z

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request introduces a data-driven testing framework for the capa rule matcher, enabling the definition of test cases in YAML using a specialized DSL for synthetic features. The changes include a new documentation guide, a parser for both static and dynamic features, and a set of initial test fixtures. Review feedback identifies a bug in the _parse_number_literal function where hexadecimal values containing the letter 'e' are incorrectly parsed as floats, and notes that the instruction line parsing logic is fragile when feature text contains colons.

gemini-code-assist · 2026-04-02T12:14:02Z

tests/match_fixtures.py

+def _parse_number_literal(value: str) -> int | float:
+    if any(character in value for character in ".eE"):
+        return float(value)
+    return int(value, 0)


The current implementation of _parse_number_literal incorrectly attempts to parse hexadecimal numbers containing 'e' or 'E' (e.g., 0x1e, 0xDEADBEEF) as floats, which will raise a ValueError. Hexadecimal literals should always be parsed as integers.

Suggested change

def _parse_number_literal(value: str) -> int | float:

if any(character in value for character in ".eE"):

return float(value)

return int(value, 0)

def _parse_number_literal(value: str) -> int | float:

if "0x" in value.lower():

return int(value, 0)

if any(character in value for character in ".eE"):

return float(value)

return int(value, 0)

gemini-code-assist · 2026-04-02T12:14:02Z

tests/match_fixtures.py

+        parts = rest.split(": ", 2)
+        if len(parts) == 2:
+            instruction_address = _parse_static_address(parts[0])
+            feature_text = parts[1]
+        elif len(parts) == 3:
+            function_address = _parse_static_address(parts[0])
+            if function_address != self.current_function:
+                raise ValueError(
+                    f"instruction line changed function without a function header: {line}"
+                )
+            instruction_address = _parse_static_address(parts[1])
+            feature_text = parts[2]
+        else:
+            raise ValueError(f"unsupported instruction feature line: {line}")


The logic for splitting instruction lines is fragile when the feature text contains : (e.g., string(key: value)). In such cases, split(": ", 2) will produce 3 parts, and the code will incorrectly attempt to parse the first part of the feature as an instruction address. A more robust approach is to check if the potential second address part is actually a valid address before treating it as the 3-part form.

addr1_text, sep1, remainder = rest.partition(": ") if not sep1: raise ValueError(f"unsupported instruction feature line: {line}") addr2_text, sep2, feature_text = remainder.partition(": ") if sep2: try: # Try to parse the second part as an address to distinguish between # <func-addr>: <insn-addr>: <feature> and <insn-addr>: <feature-with-colon> instruction_address = _parse_static_address(addr2_text) function_address = _parse_static_address(addr1_text) if function_address != self.current_function: raise ValueError( f"instruction line changed function without a function header: {line}" ) except ValueError: # Second part is not an address, so it must be part of the feature instruction_address = _parse_static_address(addr1_text) feature_text = remainder else: instruction_address = _parse_static_address(addr1_text) feature_text = remainder

williballenthin · 2026-04-02T12:15:55Z

tests/fixtures/matcher/dynamic/core.yml

test fixture files should be moved under:

tests/fixtures/static/dynamic/*
tests/fixtures/matcher/dynamic/*

williballenthin · 2026-04-02T12:16:17Z

tests/matcher-fixtures/dynamic/core.yml

@@ -0,0 +1,36 @@
+- name: dynamic-span-window
+  description: span-of-calls matching honors the configured window size
+  flavor: dynamic


maybe flavor comes from the fixture file location?

williballenthin · 2026-04-02T12:17:34Z

tests/fixtures/matcher/dynamic/core.yml

+        scopes:
+          static: unsupported
+          dynamic: span of calls


would be nice if we didn't have to specify this

at least the unsupported ones

williballenthin · 2026-04-02T12:43:10Z

tests/fixtures/matcher/dynamic/core.yml

+    span size: 2
+  rules:
+    - meta:
+        name: span-resolve-add-veh


need "description" here, too, to explain why the test exists

williballenthin · 2026-04-02T12:44:54Z

tests/matcher-fixtures/dynamic/core.yml

+  expect:
+    matches:
+      span-resolve-add-veh:
+        - [call, 2456, 3052, 3064, 11]


this is painful. maybe we can assert that the call IDs are unique within the test features/events, and then we can just refer to the call IDs, not to the process/thread/etc.

williballenthin · 2026-04-02T12:45:10Z

tests/matcher-fixtures/static/core.yml

+- name: scope-boundary
+  description: function scope aggregates across basic blocks, but basic block scope does not
+  flavor: static
+  base address: 0x401000


this doesn't seem necessary in the common case

github-actions bot requested changes Apr 2, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

williballenthin commented Apr 2, 2026

View reviewed changes

tests: add data-driven test fixtures for rule matcher

beae5ef

williballenthin force-pushed the capa-matcher-fixtures branch from e28743f to beae5ef Compare April 2, 2026 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: add data-driven test fixtures for rule matcher#2987

tests: add data-driven test fixtures for rule matcher#2987
williballenthin wants to merge 1 commit intomasterfrom
capa-matcher-fixtures

williballenthin commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

williballenthin Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

williballenthin commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

williballenthin commented Apr 2, 2026 •

edited

Loading