|
| 1 | +1. Purpose |
| 2 | + |
| 3 | +These fixtures provide small, data-driven matcher tests. Each test pairs: |
| 4 | +- a rule fragment, |
| 5 | +- a synthetic feature listing, |
| 6 | +- and the exact matches that capa should report. |
| 7 | + |
| 8 | +They are intended for matcher behavior, not end-to-end binary analysis. |
| 9 | + |
| 10 | +2. Where the tests live and how they run |
| 11 | + |
| 12 | +2a. Fixture files live under `tests/matcher-fixtures/`. |
| 13 | + |
| 14 | +2b. The pytest entrypoint is `tests/test_match_fixtures.py`. |
| 15 | + |
| 16 | +2c. The loader and DSL parser live in `tests/match_fixtures.py`. |
| 17 | + |
| 18 | +2d. Files are loaded in lexicographic path order. Tests inside a file are loaded in YAML order. |
| 19 | + |
| 20 | +2e. Run the suite with: |
| 21 | + |
| 22 | +```sh |
| 23 | +pytest -q tests/test_match_fixtures.py |
| 24 | +``` |
| 25 | + |
| 26 | +2f. Run a subset with: |
| 27 | + |
| 28 | +```sh |
| 29 | +pytest -q tests/test_match_fixtures.py -k <term> |
| 30 | +``` |
| 31 | + |
| 32 | +3. Canonical file format |
| 33 | + |
| 34 | +Use a top-level YAML list. Each list element is one test case. |
| 35 | + |
| 36 | +Example: |
| 37 | + |
| 38 | +```yaml |
| 39 | +- name: scope-boundary |
| 40 | + description: function scope aggregates across basic blocks |
| 41 | + flavor: static |
| 42 | + base address: 0x401000 |
| 43 | + rules: |
| 44 | + - meta: |
| 45 | + name: function-cross-basic-block |
| 46 | + scopes: |
| 47 | + static: function |
| 48 | + dynamic: unsupported |
| 49 | + features: |
| 50 | + - and: |
| 51 | + - mnemonic: mov |
| 52 | + - mnemonic: add |
| 53 | + features: | |
| 54 | + func: 0x401000 |
| 55 | + bb: 0x401000: basic block |
| 56 | + insn: 0x401000: mnemonic(mov) |
| 57 | + bb: 0x401010: basic block |
| 58 | + insn: 0x401010: mnemonic(add) |
| 59 | + expect: |
| 60 | + matches: |
| 61 | + function-cross-basic-block: |
| 62 | + - 0x401000 |
| 63 | +``` |
| 64 | +
|
| 65 | +4. Per-test fields |
| 66 | +
|
| 67 | +4a. `name` |
| 68 | +A stable human-readable identifier. Pytest ids include this value. |
| 69 | + |
| 70 | +4b. `description` |
| 71 | +A short explanation of the behavior under test. |
| 72 | + |
| 73 | +4c. `flavor` |
| 74 | +Either `static` or `dynamic`. |
| 75 | + |
| 76 | +4d. `base address` |
| 77 | +Used only for static tests. Defaults to `0` if omitted. |
| 78 | + |
| 79 | +4e. `rules` |
| 80 | +A list of rule fragments in normal capa rule syntax. These are wrapped and passed through `capa.rules.Rule.from_dict()`. |
| 81 | + |
| 82 | +4f. `features` |
| 83 | +A block string or list of strings containing the show-features-like DSL described below. |
| 84 | + |
| 85 | +4g. `expect.matches` |
| 86 | +Maps authored rule names to the exact match locations that should be returned. |
| 87 | + |
| 88 | +4h. `options.span size` |
| 89 | +Optional. If present, patches `capa.capabilities.dynamic.SPAN_SIZE` for that one test. |
| 90 | + |
| 91 | +5. Match semantics |
| 92 | + |
| 93 | +5a. Expectations are exact. |
| 94 | +The test asserts the exact authored rule names that matched and the exact list of locations for each rule. |
| 95 | + |
| 96 | +5b. Generated subscope helper rules are ignored. |
| 97 | +Only authored rules are compared in `expect.matches`. |
| 98 | + |
| 99 | +5c. Match order matters. |
| 100 | +This is especially relevant for dynamic span-of-calls behavior. |
| 101 | + |
| 102 | +6. Feature DSL |
| 103 | + |
| 104 | +The DSL is intentionally close to `scripts/show-features.py`. Each line describes one feature or one scope header. |
| 105 | + |
| 106 | +6a. Static scope lines |
| 107 | + |
| 108 | +Accepted line prefixes: |
| 109 | +- `global:` |
| 110 | +- `file:` |
| 111 | +- `func:` |
| 112 | +- `bb:` |
| 113 | +- `insn:` |
| 114 | + |
| 115 | +Examples: |
| 116 | + |
| 117 | +```text |
| 118 | +global: global: os(windows) |
| 119 | +file: 0x402345: characteristic(embedded pe) |
| 120 | +func: 0x401000 |
| 121 | +func: 0x401000: string(hello world) |
| 122 | +bb: 0x401000: basic block |
| 123 | +bb: 0x401000: characteristic(tight loop) |
| 124 | +insn: 0x401000: mnemonic(mov) |
| 125 | +insn: 0x401000: offset(0x402000) -> 0x402000 |
| 126 | +insn: 0x401000: 0x401002: number(0x10) |
| 127 | +``` |
| 128 | + |
| 129 | +Notes: |
| 130 | +- `func: <addr>` is a function header. It sets the current function. |
| 131 | +- `bb:` lines attach to the current function and also set the current basic block. |
| 132 | +- `insn:` lines attach to the current basic block. |
| 133 | +- `insn:` accepts either `insn: <insn-addr>: <feature>` or `insn: <func-addr>: <insn-addr>: <feature>`. |
| 134 | +- `-> <addr>` overrides the feature location. Without it, the location defaults to the current scope address. |
| 135 | +- `file:` lines require an explicit address and do not support `->`. |
| 136 | + |
| 137 | +6b. Dynamic scope lines |
| 138 | + |
| 139 | +Accepted line prefixes: |
| 140 | +- `global:` |
| 141 | +- `file:` |
| 142 | +- `proc:` |
| 143 | +- `thread:` |
| 144 | +- `call:` |
| 145 | + |
| 146 | +Examples: |
| 147 | + |
| 148 | +```text |
| 149 | +proc: sample.exe (ppid=2456, pid=3052) |
| 150 | +proc: sample.exe: string(config) |
| 151 | +thread: 3064 |
| 152 | +thread: 3064: string(worker) |
| 153 | +call: 11: api(LdrGetProcedureAddress) |
| 154 | +call: 11: string(AddVectoredExceptionHandler) |
| 155 | +call: 11: string(kernel32.dll) -> process{pid:3052,tid:3064,call:11} |
| 156 | +``` |
| 157 | + |
| 158 | +Notes: |
| 159 | +- `proc: <name> (ppid=<n>, pid=<n>)` is a process header. It sets the current process. |
| 160 | +- `thread: <tid>` is a thread header. It sets the current thread. |
| 161 | +- `call:` lines attach to the current thread. |
| 162 | +- `proc: <name>: <feature>` attaches a process-scope feature to the current process. The name must match the current process header. |
| 163 | +- `thread: <tid>: <feature>` attaches a thread-scope feature and also sets the current thread. |
| 164 | +- `-> <addr>` overrides the feature location. Without it, the location defaults to the current scope address. |
| 165 | + |
| 166 | +6c. Supported feature atoms |
| 167 | + |
| 168 | +Currently the parser supports these atoms: |
| 169 | +- `basic block` |
| 170 | +- `api(...)` |
| 171 | +- `arch(...)` |
| 172 | +- `bytes(...)` |
| 173 | +- `characteristic(...)` |
| 174 | +- `class(...)` |
| 175 | +- `export(...)` |
| 176 | +- `format(...)` |
| 177 | +- `function-name(...)` |
| 178 | +- `function name(...)` |
| 179 | +- `import(...)` |
| 180 | +- `match(...)` |
| 181 | +- `mnemonic(...)` |
| 182 | +- `namespace(...)` |
| 183 | +- `number(...)` |
| 184 | +- `offset(...)` |
| 185 | +- `os(...)` |
| 186 | +- `section(...)` |
| 187 | +- `string(...)` |
| 188 | +- `substring(...)` |
| 189 | +- `operand[n].number(...)` |
| 190 | +- `operand[n].offset(...)` |
| 191 | +- `property(...)` |
| 192 | +- `property/read(...)` |
| 193 | +- `property/write(...)` |
| 194 | + |
| 195 | +Examples: |
| 196 | + |
| 197 | +```text |
| 198 | +mnemonic(mov) |
| 199 | +number(0x10) |
| 200 | +string(hello world) |
| 201 | +bytes(41 42 43) |
| 202 | +operand[0].number(0x10) |
| 203 | +property/read(System.IO.FileInfo::Length) |
| 204 | +``` |
| 205 | + |
| 206 | +6d. Supported address syntax |
| 207 | + |
| 208 | +The parser accepts both rendered string forms and tagged YAML arrays. |
| 209 | + |
| 210 | +String forms include: |
| 211 | +- `0x401000` |
| 212 | +- `base address+0x100` |
| 213 | +- `file+0x20` |
| 214 | +- `token(0x1234)` |
| 215 | +- `token(0x1234)+0x10` |
| 216 | +- `global` |
| 217 | +- `process{pid:3052}` |
| 218 | +- `process{pid:3052,tid:3064}` |
| 219 | +- `process{pid:3052,tid:3064,call:11}` |
| 220 | +- the same process/thread/call forms with `ppid:` included |
| 221 | + |
| 222 | +Tagged YAML arrays include: |
| 223 | +- `[absolute, 0x401000]` |
| 224 | +- `[relative, 0x100]` |
| 225 | +- `[file, 0x20]` |
| 226 | +- `[token, 0x1234]` |
| 227 | +- `[token offset, 0x1234, 0x10]` |
| 228 | +- `[process, 2456, 3052]` |
| 229 | +- `[thread, 2456, 3052, 3064]` |
| 230 | +- `[call, 2456, 3052, 3064, 11]` |
| 231 | +- `[no address]` |
| 232 | + |
| 233 | +7. Adding a new test case |
| 234 | + |
| 235 | +7a. Pick the right fixture file under `tests/matcher-fixtures/`, or add a new file if the new cases form a clear group. |
| 236 | + |
| 237 | +7b. Append a new test entry to the top-level YAML list. Keep related tests together. |
| 238 | + |
| 239 | +7c. Add a short `description` that states the matcher behavior being asserted. |
| 240 | + |
| 241 | +7d. Keep the rule fragment minimal. Include only the features needed for the behavior under test. |
| 242 | + |
| 243 | +7e. Write the synthetic feature listing in the DSL. Prefer the same wording and feature rendering that `show-features.py` emits. |
| 244 | + |
| 245 | +7f. Add `expect.matches` with the exact authored rule names and locations. |
| 246 | + |
| 247 | +7g. Run: |
| 248 | + |
| 249 | +```sh |
| 250 | +pytest -q tests/test_match_fixtures.py -k <new-test-name> |
| 251 | +``` |
| 252 | + |
| 253 | +8. When to add parser support |
| 254 | + |
| 255 | +8a. If a new test only needs existing atoms and line prefixes, do not change Python code. Just add YAML. |
| 256 | + |
| 257 | +8b. If a new test needs a feature atom that the parser does not understand, update `_parse_feature()` in `tests/match_fixtures.py`. |
| 258 | + |
| 259 | +8c. If a new test needs a new scope line form, update `StaticFeatureParser` or `DynamicFeatureParser` in `tests/match_fixtures.py`. |
| 260 | + |
| 261 | +8d. If you extend the DSL, also update this document and add at least one fixture that exercises the new syntax. |
0 commit comments