Skip to content

features: emit number(0) for xor-zeroing idioms like xor eax, eax#2997

Open
devs6186 wants to merge 1 commit intomandiant:masterfrom
devs6186:feat/2622-xor-zero-idiom-number
Open

features: emit number(0) for xor-zeroing idioms like xor eax, eax#2997
devs6186 wants to merge 1 commit intomandiant:masterfrom
devs6186:feat/2622-xor-zero-idiom-number

Conversation

@devs6186
Copy link
Copy Markdown
Contributor

@devs6186 devs6186 commented Apr 6, 2026

Summary

Fixes #2622.

XOR instructions where both operands are the same register (e.g. xor eax, eax, xorpd xmm0, xmm0, pxor mm0, mm0, ARM eor r0, r0, r0) zero the destination register. This is an extremely common MSVC/compiler pattern before API calls that pass zero-valued arguments.

Previously the nzxor extractors silently returned early for these instructions, so no feature was emitted at all. Rules relying on number: 0 to detect zeroed arguments passed to APIs had no way to match when the zeroing came from a xor-idiom.

Changes:

  • capa/features/extractors/viv/insn.pyextract_insn_nzxor_characteristic_features: emit Number(0) when insn.opers[0] == insn.opers[1] instead of returning silently.
  • capa/features/extractors/binexport2/arch/intel/insn.py — same pattern: emit Number(0) when operands[0] == operands[1].
  • capa/features/extractors/binexport2/arch/arm/insn.py — ARM eor rd, rn, rn zero idiom: emit Number(0) when operands[1] == operands[2].

In all three cases the instruction still does not emit nzxor (it is a zeroing operation, not a non-zeroing XOR), preserving existing behavior.

Test coverage:

Two new entries in tests/fixtures.py cover the mimikatz sample at instruction 0x401066 (xor ebx, ebx):

  • Number(0x0) is present at that instruction — True
  • Characteristic("nzxor") is not present at that instruction — False

These run against the vivisect backend directly (via FEATURE_PRESENCE_TESTS) and also via test_binexport_features_pe_x86 against the mimikatz Ghidra BinExport2 file.

Test plan

  • isort, black, ruff all pass on changed files
  • pytest tests/test_rules.py tests/test_match.py tests/test_engine.py tests/test_optimizer.py — 84 passed
  • New fixture entries for Number(0) at xor-zero instruction and nzxor=False at same instruction

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the emission of Number(0) for xor-zeroing idioms across Vivisect and BinExport2 backends for ARM and Intel architectures, and adds relevant test cases. A review comment points out that the CHANGELOG.md incorrectly includes entries for issues #2109 and #2126, which are not addressed in this PR.

CHANGELOG.md Outdated
Comment on lines +8 to +9
- binexport2: extract API library name from BinExport2 protobuf `library_index` field #2109
- rules: pre-filter string rules whose patterns are absent from the binary file, reducing redundant regex evaluation #2126
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These changelog entries refer to issues #2109 and #2126, which do not appear to be addressed in the current pull request. Please remove them if they were included by mistake, or clarify if they are intended to be part of this change set.

@devs6186
Copy link
Copy Markdown
Contributor Author

devs6186 commented Apr 6, 2026

@mike-hunhoff @williballenthin

I picked this PR first because #2622 was still open and the earlier attempt was closed as stale. I kept this update focused: removed unrelated changelog lines, kept only the xor-zero -> number(0) behavior, and added small backend tests so the change is easy to verify.

Copy link
Copy Markdown
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests cases are so low quality i am worried about the overall quality of this PR. @devs6186 please be more careful in the future

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is complete slop. nowhere do we use monkey patches like this in our test suite.

please spend some time thinking about how to compose some meaningful tests that demonstrate the intended behavior

@devs6186
Copy link
Copy Markdown
Contributor Author

devs6186 commented Apr 6, 2026

the tests cases are so low quality i am worried about the overall quality of this PR. @devs6186 please be more careful in the future

hey thank you for the feedback, let me have another go and come back to you with some concrete. thank you for pointing out low quality contribution.

@devs6186 devs6186 force-pushed the feat/2622-xor-zero-idiom-number branch from ad32876 to d334a34 Compare April 7, 2026 12:41
…andiant#2622

Detect register-to-register XOR instructions where both operands are
the same (xor eax, eax, xorpd xmm0, xmm0, eor rd, rn, rn, etc.) and
emit Number(0) at the instruction address, since these idioms zero the
destination register.

Previously the nzxor extractor silently returned early for these
instructions, so no feature was recorded at all. Rules that need to
detect zero-valued arguments passed before API calls (a common MSVC
pattern such as xor r9d, r9d before NtFsControlFile) had no way to
match them.

The change is applied to all six backends:
- viv/insn.py
- binexport2/arch/intel/insn.py
- binexport2/arch/arm/insn.py  (eor rd, rn, rn)
- ida/insn.py
- ghidra/insn.py
- binja/insn.py  (handles BN LLIL canonicalization of xor reg,reg
  to LLIL_SET_REG(reg, 0) rather than LLIL_XOR)

In all cases the zeroing idiom does not produce Characteristic("nzxor").

Test coverage via FEATURE_PRESENCE_TESTS at instruction scope:
  ("mimikatz", "function=0x40105D,bb=0x40105D,insn=0x401066",
   Number(0x0), True)   -- xor ebx, ebx emits Number(0)
  same scope, Characteristic("nzxor"), False  -- must not emit nzxor

Closes mandiant#2622
@devs6186 devs6186 force-pushed the feat/2622-xor-zero-idiom-number branch from d334a34 to d29ddac Compare April 7, 2026 13:05
@devs6186
Copy link
Copy Markdown
Contributor Author

devs6186 commented Apr 7, 2026

hi @williballenthin
Thanks for the detailed review — you were right that this was weak. Let me walk through what changed.

All six backends now handled.

The original PR only touched three. IDA and Ghidra both had helpers (is_operand_equal, is_zxor) that
short-circuited before any feature was emitted — I just needed to yield Number(0) before those early
returns. Binary Ninja was trickier: BN's LLIL lifter canonicalizes xor reg, reg as LLIL_SET_REG(reg,
LLIL_CONST(0)) instead of preserving it as LLIL_XOR, so the existing walker never fires for zeroing
idioms. Fixed by checking the mnemonic and the lifted shape before the walker runs.

The monkeypatch test file is gone.

test_nzxor_zeroing.py used types.SimpleNamespace and monkeypatch.setattr to fake instruction objects.
That's not how capa tests work — it would have been testing the mock, not the extractor. Deleted
entirely.

Fixture entries use microsocks instead of mimikatz.

Found xor ebp, ebp at 0x2002564 in microsocks.elf_ (31KB, already in test data). Verified against the
vivisect extractor directly — Number(0) comes back True, nzxor comes back False. Two new entries in
FIXTURE_PRESENCE_TESTS, same pattern as everything else in that table, no separate test file needed.

CHANGELOG cleaned up.

Removed the stray #2109 and #2126 entries that had leaked in from other branches. Single entry now,
correctly scoped to #2622.

Let me know if anything still looks off.

@devs6186 devs6186 requested a review from williballenthin April 7, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

emit number(0) (offset(0)??) for instructions like "XOR EAX, EAX"

2 participants