Skip to content

Commit e28743f

Browse files
tests: add data-driven test fixtures for rule matcher
1 parent 7b23834 commit e28743f

File tree

6 files changed

+1252
-0
lines changed

6 files changed

+1252
-0
lines changed
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
1. Purpose
2+
3+
These fixtures provide small, data-driven matcher tests. Each test pairs:
4+
- a rule fragment,
5+
- a synthetic feature listing,
6+
- and the exact matches that capa should report.
7+
8+
They are intended for matcher behavior, not end-to-end binary analysis.
9+
10+
2. Where the tests live and how they run
11+
12+
2a. Fixture files live under `tests/matcher-fixtures/`.
13+
14+
2b. The pytest entrypoint is `tests/test_match_fixtures.py`.
15+
16+
2c. The loader and DSL parser live in `tests/match_fixtures.py`.
17+
18+
2d. Files are loaded in lexicographic path order. Tests inside a file are loaded in YAML order.
19+
20+
2e. Run the suite with:
21+
22+
```sh
23+
pytest -q tests/test_match_fixtures.py
24+
```
25+
26+
2f. Run a subset with:
27+
28+
```sh
29+
pytest -q tests/test_match_fixtures.py -k <term>
30+
```
31+
32+
3. Canonical file format
33+
34+
Use a top-level YAML list. Each list element is one test case.
35+
36+
Example:
37+
38+
```yaml
39+
- name: scope-boundary
40+
description: function scope aggregates across basic blocks
41+
flavor: static
42+
base address: 0x401000
43+
rules:
44+
- meta:
45+
name: function-cross-basic-block
46+
scopes:
47+
static: function
48+
dynamic: unsupported
49+
features:
50+
- and:
51+
- mnemonic: mov
52+
- mnemonic: add
53+
features: |
54+
func: 0x401000
55+
bb: 0x401000: basic block
56+
insn: 0x401000: mnemonic(mov)
57+
bb: 0x401010: basic block
58+
insn: 0x401010: mnemonic(add)
59+
expect:
60+
matches:
61+
function-cross-basic-block:
62+
- 0x401000
63+
```
64+
65+
4. Per-test fields
66+
67+
4a. `name`
68+
A stable human-readable identifier. Pytest ids include this value.
69+
70+
4b. `description`
71+
A short explanation of the behavior under test.
72+
73+
4c. `flavor`
74+
Either `static` or `dynamic`.
75+
76+
4d. `base address`
77+
Used only for static tests. Defaults to `0` if omitted.
78+
79+
4e. `rules`
80+
A list of rule fragments in normal capa rule syntax. These are wrapped and passed through `capa.rules.Rule.from_dict()`.
81+
82+
4f. `features`
83+
A block string or list of strings containing the show-features-like DSL described below.
84+
85+
4g. `expect.matches`
86+
Maps authored rule names to the exact match locations that should be returned.
87+
88+
4h. `options.span size`
89+
Optional. If present, patches `capa.capabilities.dynamic.SPAN_SIZE` for that one test.
90+
91+
5. Match semantics
92+
93+
5a. Expectations are exact.
94+
The test asserts the exact authored rule names that matched and the exact list of locations for each rule.
95+
96+
5b. Generated subscope helper rules are ignored.
97+
Only authored rules are compared in `expect.matches`.
98+
99+
5c. Match order matters.
100+
This is especially relevant for dynamic span-of-calls behavior.
101+
102+
6. Feature DSL
103+
104+
The DSL is intentionally close to `scripts/show-features.py`. Each line describes one feature or one scope header.
105+
106+
6a. Static scope lines
107+
108+
Accepted line prefixes:
109+
- `global:`
110+
- `file:`
111+
- `func:`
112+
- `bb:`
113+
- `insn:`
114+
115+
Examples:
116+
117+
```text
118+
global: global: os(windows)
119+
file: 0x402345: characteristic(embedded pe)
120+
func: 0x401000
121+
func: 0x401000: string(hello world)
122+
bb: 0x401000: basic block
123+
bb: 0x401000: characteristic(tight loop)
124+
insn: 0x401000: mnemonic(mov)
125+
insn: 0x401000: offset(0x402000) -> 0x402000
126+
insn: 0x401000: 0x401002: number(0x10)
127+
```
128+
129+
Notes:
130+
- `func: <addr>` is a function header. It sets the current function.
131+
- `bb:` lines attach to the current function and also set the current basic block.
132+
- `insn:` lines attach to the current basic block.
133+
- `insn:` accepts either `insn: <insn-addr>: <feature>` or `insn: <func-addr>: <insn-addr>: <feature>`.
134+
- `-> <addr>` overrides the feature location. Without it, the location defaults to the current scope address.
135+
- `file:` lines require an explicit address and do not support `->`.
136+
137+
6b. Dynamic scope lines
138+
139+
Accepted line prefixes:
140+
- `global:`
141+
- `file:`
142+
- `proc:`
143+
- `thread:`
144+
- `call:`
145+
146+
Examples:
147+
148+
```text
149+
proc: sample.exe (ppid=2456, pid=3052)
150+
proc: sample.exe: string(config)
151+
thread: 3064
152+
thread: 3064: string(worker)
153+
call: 11: api(LdrGetProcedureAddress)
154+
call: 11: string(AddVectoredExceptionHandler)
155+
call: 11: string(kernel32.dll) -> process{pid:3052,tid:3064,call:11}
156+
```
157+
158+
Notes:
159+
- `proc: <name> (ppid=<n>, pid=<n>)` is a process header. It sets the current process.
160+
- `thread: <tid>` is a thread header. It sets the current thread.
161+
- `call:` lines attach to the current thread.
162+
- `proc: <name>: <feature>` attaches a process-scope feature to the current process. The name must match the current process header.
163+
- `thread: <tid>: <feature>` attaches a thread-scope feature and also sets the current thread.
164+
- `-> <addr>` overrides the feature location. Without it, the location defaults to the current scope address.
165+
166+
6c. Supported feature atoms
167+
168+
Currently the parser supports these atoms:
169+
- `basic block`
170+
- `api(...)`
171+
- `arch(...)`
172+
- `bytes(...)`
173+
- `characteristic(...)`
174+
- `class(...)`
175+
- `export(...)`
176+
- `format(...)`
177+
- `function-name(...)`
178+
- `function name(...)`
179+
- `import(...)`
180+
- `match(...)`
181+
- `mnemonic(...)`
182+
- `namespace(...)`
183+
- `number(...)`
184+
- `offset(...)`
185+
- `os(...)`
186+
- `section(...)`
187+
- `string(...)`
188+
- `substring(...)`
189+
- `operand[n].number(...)`
190+
- `operand[n].offset(...)`
191+
- `property(...)`
192+
- `property/read(...)`
193+
- `property/write(...)`
194+
195+
Examples:
196+
197+
```text
198+
mnemonic(mov)
199+
number(0x10)
200+
string(hello world)
201+
bytes(41 42 43)
202+
operand[0].number(0x10)
203+
property/read(System.IO.FileInfo::Length)
204+
```
205+
206+
6d. Supported address syntax
207+
208+
The parser accepts both rendered string forms and tagged YAML arrays.
209+
210+
String forms include:
211+
- `0x401000`
212+
- `base address+0x100`
213+
- `file+0x20`
214+
- `token(0x1234)`
215+
- `token(0x1234)+0x10`
216+
- `global`
217+
- `process{pid:3052}`
218+
- `process{pid:3052,tid:3064}`
219+
- `process{pid:3052,tid:3064,call:11}`
220+
- the same process/thread/call forms with `ppid:` included
221+
222+
Tagged YAML arrays include:
223+
- `[absolute, 0x401000]`
224+
- `[relative, 0x100]`
225+
- `[file, 0x20]`
226+
- `[token, 0x1234]`
227+
- `[token offset, 0x1234, 0x10]`
228+
- `[process, 2456, 3052]`
229+
- `[thread, 2456, 3052, 3064]`
230+
- `[call, 2456, 3052, 3064, 11]`
231+
- `[no address]`
232+
233+
7. Adding a new test case
234+
235+
7a. Pick the right fixture file under `tests/matcher-fixtures/`, or add a new file if the new cases form a clear group.
236+
237+
7b. Append a new test entry to the top-level YAML list. Keep related tests together.
238+
239+
7c. Add a short `description` that states the matcher behavior being asserted.
240+
241+
7d. Keep the rule fragment minimal. Include only the features needed for the behavior under test.
242+
243+
7e. Write the synthetic feature listing in the DSL. Prefer the same wording and feature rendering that `show-features.py` emits.
244+
245+
7f. Add `expect.matches` with the exact authored rule names and locations.
246+
247+
7g. Run:
248+
249+
```sh
250+
pytest -q tests/test_match_fixtures.py -k <new-test-name>
251+
```
252+
253+
8. When to add parser support
254+
255+
8a. If a new test only needs existing atoms and line prefixes, do not change Python code. Just add YAML.
256+
257+
8b. If a new test needs a feature atom that the parser does not understand, update `_parse_feature()` in `tests/match_fixtures.py`.
258+
259+
8c. If a new test needs a new scope line form, update `StaticFeatureParser` or `DynamicFeatureParser` in `tests/match_fixtures.py`.
260+
261+
8d. If you extend the DSL, also update this document and add at least one fixture that exercises the new syntax.

0 commit comments

Comments
 (0)