features: emit number(0) for xor-zeroing idioms like xor eax, eax#2997
features: emit number(0) for xor-zeroing idioms like xor eax, eax#2997devs6186 wants to merge 1 commit intomandiant:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the emission of Number(0) for xor-zeroing idioms across Vivisect and BinExport2 backends for ARM and Intel architectures, and adds relevant test cases. A review comment points out that the CHANGELOG.md incorrectly includes entries for issues #2109 and #2126, which are not addressed in this PR.
CHANGELOG.md
Outdated
| - binexport2: extract API library name from BinExport2 protobuf `library_index` field #2109 | ||
| - rules: pre-filter string rules whose patterns are absent from the binary file, reducing redundant regex evaluation #2126 |
|
@mike-hunhoff @williballenthin I picked this PR first because #2622 was still open and the earlier attempt was closed as stale. I kept this update focused: removed unrelated changelog lines, kept only the xor-zero -> number(0) behavior, and added small backend tests so the change is easy to verify. |
williballenthin
left a comment
There was a problem hiding this comment.
the tests cases are so low quality i am worried about the overall quality of this PR. @devs6186 please be more careful in the future
tests/test_nzxor_zeroing.py
Outdated
There was a problem hiding this comment.
this is complete slop. nowhere do we use monkey patches like this in our test suite.
please spend some time thinking about how to compose some meaningful tests that demonstrate the intended behavior
hey thank you for the feedback, let me have another go and come back to you with some concrete. thank you for pointing out low quality contribution. |
ad32876 to
d334a34
Compare
…andiant#2622 Detect register-to-register XOR instructions where both operands are the same (xor eax, eax, xorpd xmm0, xmm0, eor rd, rn, rn, etc.) and emit Number(0) at the instruction address, since these idioms zero the destination register. Previously the nzxor extractor silently returned early for these instructions, so no feature was recorded at all. Rules that need to detect zero-valued arguments passed before API calls (a common MSVC pattern such as xor r9d, r9d before NtFsControlFile) had no way to match them. The change is applied to all six backends: - viv/insn.py - binexport2/arch/intel/insn.py - binexport2/arch/arm/insn.py (eor rd, rn, rn) - ida/insn.py - ghidra/insn.py - binja/insn.py (handles BN LLIL canonicalization of xor reg,reg to LLIL_SET_REG(reg, 0) rather than LLIL_XOR) In all cases the zeroing idiom does not produce Characteristic("nzxor"). Test coverage via FEATURE_PRESENCE_TESTS at instruction scope: ("mimikatz", "function=0x40105D,bb=0x40105D,insn=0x401066", Number(0x0), True) -- xor ebx, ebx emits Number(0) same scope, Characteristic("nzxor"), False -- must not emit nzxor Closes mandiant#2622
d334a34 to
d29ddac
Compare
|
hi @williballenthin All six backends now handled. The original PR only touched three. IDA and Ghidra both had helpers (is_operand_equal, is_zxor) that The monkeypatch test file is gone. test_nzxor_zeroing.py used types.SimpleNamespace and monkeypatch.setattr to fake instruction objects. Fixture entries use microsocks instead of mimikatz. Found xor ebp, ebp at 0x2002564 in microsocks.elf_ (31KB, already in test data). Verified against the CHANGELOG cleaned up. Removed the stray #2109 and #2126 entries that had leaked in from other branches. Single entry now, Let me know if anything still looks off. |
Summary
Fixes #2622.
XOR instructions where both operands are the same register (e.g.
xor eax, eax,xorpd xmm0, xmm0,pxor mm0, mm0, ARMeor r0, r0, r0) zero the destination register. This is an extremely common MSVC/compiler pattern before API calls that pass zero-valued arguments.Previously the nzxor extractors silently returned early for these instructions, so no feature was emitted at all. Rules relying on
number: 0to detect zeroed arguments passed to APIs had no way to match when the zeroing came from a xor-idiom.Changes:
capa/features/extractors/viv/insn.py—extract_insn_nzxor_characteristic_features: emitNumber(0)wheninsn.opers[0] == insn.opers[1]instead of returning silently.capa/features/extractors/binexport2/arch/intel/insn.py— same pattern: emitNumber(0)whenoperands[0] == operands[1].capa/features/extractors/binexport2/arch/arm/insn.py— ARMeor rd, rn, rnzero idiom: emitNumber(0)whenoperands[1] == operands[2].In all three cases the instruction still does not emit
nzxor(it is a zeroing operation, not a non-zeroing XOR), preserving existing behavior.Test coverage:
Two new entries in
tests/fixtures.pycover the mimikatz sample at instruction0x401066(xor ebx, ebx):Number(0x0)is present at that instruction — TrueCharacteristic("nzxor")is not present at that instruction — FalseThese run against the vivisect backend directly (via
FEATURE_PRESENCE_TESTS) and also viatest_binexport_features_pe_x86against the mimikatz Ghidra BinExport2 file.Test plan
isort,black,ruffall pass on changed filespytest tests/test_rules.py tests/test_match.py tests/test_engine.py tests/test_optimizer.py— 84 passedNumber(0)at xor-zero instruction andnzxor=Falseat same instruction