Skip to content

Update to Zig 0.14.1 including ManagedIter#3

Merged
SoraTenshi merged 24 commits into
Zig-Sec:mainfrom
IamSanjid:zig-master
Jun 1, 2025
Merged

Update to Zig 0.14.1 including ManagedIter#3
SoraTenshi merged 24 commits into
Zig-Sec:mainfrom
IamSanjid:zig-master

Conversation

@IamSanjid

@IamSanjid IamSanjid commented May 29, 2025

Copy link
Copy Markdown
Contributor

Okay so, the things I exposed were really needed for one of my projects I was using this for past couple of days, well tbh was not a must but were really convinient to have.

I wanted to have different functions to analyze different parts, and those function were only executed after checking specific things.

say I want to have a function to analyze the Detail.arch.x86 field only, coz well the x86 architecture needs special logic so it kinda makes sense to have a function to only accept Detail.arch only instead of the full Insn, we could just do the Insn.detail.?.x86 but it's kinda felt redundant to do always, so just exposed Insn and Detail, we could get away with just exposing the Insn but anyways without Detail there was really no point of having just Insn, just my 2cents.

why exposing capstone-c? you won't believe how many times I needed to access the C defines, the binding doesn't expose 70% of the things I needed, to split the "analyzer" into different sections, but I feel like if it's a binding it should not expose the c part? I don't know actually I have seen some Rust bindings to do that or I didn't... But we should talk about this I really needed defines/enums values like X86_INS_*, CS_GRP_* and some others I can't remember as of writing.

That's why it's in draft phase, I really liked the way it's setup, just need to use it more to figure out more things.

@IamSanjid IamSanjid marked this pull request as draft May 29, 2025 23:29

@SoraTenshi SoraTenshi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
I wasn't using capstone much when i was creating those bindings, so it's great to get some actual user feedback. If you had some major refactorings / etc. in mind, please let me know!

Also in case there's discussion / style question, i am fine with discussing those as well.

Comment thread build.zig.zon Outdated
Comment thread src/ManagedHandle.zig Outdated

pub fn deinit(self: *Self) void {
impl.close(&self.native) catch |e| {
std.debug.print("Failed to close handle: {any}\n", .{impl.strerror(e)});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use debug print here, instead:

std.io.getStdErr().writer().print("Failed to close handle: {any}\n", .{impl.strerror(e)});

@IamSanjid IamSanjid May 30, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so std.io.getStdErr().writer().print("Failed to close handle: {any}\n", .{impl.strerror(e)}) catch {}; since the print returns error, I don't think a deinit function should return any error?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. :D

Comment thread src/insn.zig
Comment thread src/iter.zig Outdated
Comment thread src/iter.zig Outdated
insn: [*]Insn,

pub fn init(handle: Handle, code: []const u8, address: u64) IterManaged {
const insn: [*]Insn = @ptrCast(cs.cs_malloc(handle));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems a bit fishy, i believe i have forgotten to implement an abstraction for cs_malloc that returns either an error or the pointer (which must be valid).
in src/impl.zig there's a malloc function, this we should probably fix in case of OOM.

@IamSanjid IamSanjid May 30, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the cs_malloc doesn't return any error... https://github.com/capstone-engine/capstone/blob/280b749e84adca4177b7525504e55be4d8c74e44/cs.c#L1429 ow wait I mean it doesn't return the error directly but it does set the error to the handle, hmm ye the managed iter needs to return error too

@SoraTenshi SoraTenshi May 30, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about wrapping the type for it.

pub fn allocInsn(handle: Handle) ![*]Insn {
    const insn: ?[*]Insn = @ptrCast(cs.cs_malloc(handle));
    return if(insn) |i| i else error.OutOfMemory;
}

@IamSanjid IamSanjid May 30, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ye we should just replace the

pub fn malloc(handle: Handle) [*]insn.Insn {
    return @ptrCast(cs.cs_malloc(handle));
}

this malloc really doesn't make sense from the user space, since all it does is allocate space for only one insn.Insn

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about wrapping the type for it.

pub fn allocInsn(handle: Handle) ![*]Insn {
    const insn: ?[*]Insn = @ptrCast(cs.cs_malloc(handle));
    return if(insn) |i| i else error.OutOfMemory;
}

and by the way shouldn't we return a *Insn the many pointer [*] variant doesn't make sense here, coz it will always return one Insn I will try to convert all the [*] many pointer variants to one pointer variant where it's appropiate

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not tooo familiar with capstone, hence i tried and kept it as close to C as possible.
But yeah, if it is always a single instance of an Insn, then i agree with you.

@SoraTenshi

Copy link
Copy Markdown
Member

Also, i am fine with exposing all the defines that come from C, generally.
Maybe that should be part of those bindings though?

@SoraTenshi SoraTenshi changed the title Zig 0.14.1 Update but with some API changes want some opinion Update to Zig 0.14.1 including ManagedIter May 30, 2025
@IamSanjid

IamSanjid commented May 30, 2025

Copy link
Copy Markdown
Contributor Author

Also, i am fine with exposing all the defines that come from C, generally. Maybe that should be part of those bindings though?

hmm, i think we can abuse comptime to only expose the "defines"/pub const * from translated-c module, i think a binding shouldn't expose everything its inheriting from C

@IamSanjid

Copy link
Copy Markdown
Contributor Author

Great job! I wasn't using capstone much when i was creating those bindings, so it's great to get some actual user feedback. If you had some major refactorings / etc. in mind, please let me know!

Also in case there's discussion / style question, i am fine with discussing those as well.

Any thoughts how we might be exposing these?

const x86 = @import("x86/all.zig");
const arm64 = @import("arm64/all.zig");
const arm = @import("arm/all.zig");
const m68k = @import("m68k/all.zig");
const mips = @import("mips/all.zig");
const ppc = @import("ppc/all.zig");
const sparc = @import("sparc/all.zig");
const sysz = @import("sysz/all.zig");
const xcore = @import("xcore/all.zig");
const tms320c64x = @import("tms320c64x/all.zig");
const m680x = @import("m680x/all.zig");
const evm = @import("evm/all.zig");
const mos65xx = @import("mos65xx/all.zig");
const wasm = @import("wasm/all.zig");
const bpf = @import("bpf/all.zig");
const riscv = @import("riscv/all.zig");
const sh = @import("sh/all.zig");
const tricore = @import("tricore/all.zig");

some functions might just wanna specific arch like only Arm or x86 passing the Details and accessing Details.x86 feels too much repititive I am currently facing it :").

@SoraTenshi

Copy link
Copy Markdown
Member

Great job! I wasn't using capstone much when i was creating those bindings, so it's great to get some actual user feedback. If you had some major refactorings / etc. in mind, please let me know!
Also in case there's discussion / style question, i am fine with discussing those as well.

Any thoughts how we might be exposing these?

const x86 = @import("x86/all.zig");
const arm64 = @import("arm64/all.zig");
const arm = @import("arm/all.zig");
const m68k = @import("m68k/all.zig");
const mips = @import("mips/all.zig");
const ppc = @import("ppc/all.zig");
const sparc = @import("sparc/all.zig");
const sysz = @import("sysz/all.zig");
const xcore = @import("xcore/all.zig");
const tms320c64x = @import("tms320c64x/all.zig");
const m680x = @import("m680x/all.zig");
const evm = @import("evm/all.zig");
const mos65xx = @import("mos65xx/all.zig");
const wasm = @import("wasm/all.zig");
const bpf = @import("bpf/all.zig");
const riscv = @import("riscv/all.zig");
const sh = @import("sh/all.zig");
const tricore = @import("tricore/all.zig");

some functions might just wanna specific arch like only Arm or x86 passing the Details and accessing Details.x86 feels too much repititive I am currently facing it :").

In Userspace this shouldn't be much of an issue, considering you can just make the definition yourself.
e.g.

const x86 = arch.x86;
const arm = arch.arm;

I mean, an earlier way of dealing with something like this would be to just use usingnamespace but usingnamespace has been subject for removal (and i think it even already has been removed)

Can you give me an exact example of what you mean?
As in some concrete function where you notice this pattern happening way too often?

Comment thread build.zig.zon Outdated
Comment thread src/impl.zig Outdated
Comment on lines 78 to 89
/// Return an Iter object
/// Does not yet consume any element.
pub fn disasmIter(handle: Handle, code: []const u8, address: u64, ins: [*]insn.Insn) Iter {
return Iter.init(handle, code, address, ins);
return Iter{
.handle = handle,
.code = code,
.original_code = code,
.original_address = address,
.address = address,
.insn = ins,
};
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this whole function seems to be unnecessary.
This might be a style-question, but simply constructing structures by the initialization form seems in my opinion a bit better, considering that a init function that doesn't really do much other than fill in parameters seems a bit superfluous.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the init function had a little purpose here if you notice I have added original_code and original_address for resetting capabilities, from user land you would just want to pass the code not also initialize the original_code field which would feel extra work, since original_code is there just for some logical purpose and has nothing to do with user initializing an Iterator.

@IamSanjid

IamSanjid commented May 30, 2025

Copy link
Copy Markdown
Contributor Author

In Userspace this shouldn't be much of an issue, considering you can just make the definition yourself. e.g.

const x86 = arch.x86;
const arm = arch.arm;

I mean, an earlier way of dealing with something like this would be to just use usingnamespace but usingnamespace has been subject for removal (and i think it even already has been removed)

Can you give me an exact example of what you mean? As in some concrete function where you notice this pattern happening way too often?

ah let's say I am trying to analyze the call instructions, call has 2 variants call rel32 and call [rip + rel32] so I needed one function to get the Details.arch.x86.Operand which only accesses the RIP register and Memory.. this could be one function...
then again after analyzing instructions I would like to edit the modR/M byte so some function has to access the x86.encoding.modrm_offset so like these kinds of micro repitive helper function should only access x86 instead of the full Detail.arch.x86 I mean it's kinda doable but it feels like not really u know appropiate.

I was thinking we could just make them pub

const x86 = @import("x86/all.zig");
const arm64 = @import("arm64/all.zig");
const arm = @import("arm/all.zig");
const m68k = @import("m68k/all.zig");
const mips = @import("mips/all.zig");
const ppc = @import("ppc/all.zig");
const sparc = @import("sparc/all.zig");
const sysz = @import("sysz/all.zig");
const xcore = @import("xcore/all.zig");
const tms320c64x = @import("tms320c64x/all.zig");
const m680x = @import("m680x/all.zig");
const evm = @import("evm/all.zig");
const mos65xx = @import("mos65xx/all.zig");
const wasm = @import("wasm/all.zig");
const bpf = @import("bpf/all.zig");
const riscv = @import("riscv/all.zig");
const sh = @import("sh/all.zig");
const tricore = @import("tricore/all.zig");

and then just make them accessible from the main capstone.zig kinda repitive but it would be more than good enough, then we also wouldn't need to access some defines/enum values from the C header module.

IamSanjid and others added 2 commits May 30, 2025 16:17
@SoraTenshi

SoraTenshi commented May 30, 2025

Copy link
Copy Markdown
Member

In Userspace this shouldn't be much of an issue, considering you can just make the definition yourself. e.g.

const x86 = arch.x86;
const arm = arch.arm;

I mean, an earlier way of dealing with something like this would be to just use usingnamespace but usingnamespace has been subject for removal (and i think it even already has been removed)
Can you give me an exact example of what you mean? As in some concrete function where you notice this pattern happening way too often?

ah let's say I am trying to analyze the call instructions, call has 2 variants call rel32 and call [rip + rel32] so I needed one function to get the Details.arch.x86.Operand which only accesses the RIP register and Memory.. this could be one function... then again after analyzing instructions I would like to edit the modR/M byte so some function has to access the x86.encoding.modrm_offset so like these kinds of micro repitive helper function should only access x86 instead of the full Detail.arch.x86 I mean it's kinda doable but it feels like not really u know appropiate.

I was thinking we could just make them pub

const x86 = @import("x86/all.zig");
const arm64 = @import("arm64/all.zig");
const arm = @import("arm/all.zig");
const m68k = @import("m68k/all.zig");
const mips = @import("mips/all.zig");
const ppc = @import("ppc/all.zig");
const sparc = @import("sparc/all.zig");
const sysz = @import("sysz/all.zig");
const xcore = @import("xcore/all.zig");
const tms320c64x = @import("tms320c64x/all.zig");
const m680x = @import("m680x/all.zig");
const evm = @import("evm/all.zig");
const mos65xx = @import("mos65xx/all.zig");
const wasm = @import("wasm/all.zig");
const bpf = @import("bpf/all.zig");
const riscv = @import("riscv/all.zig");
const sh = @import("sh/all.zig");
const tricore = @import("tricore/all.zig");

and then just make them accessible from the main capstone.zig kinda repitive but it would be more than good enough, then we also wouldn't need to access some defines/enum values from the C header module.

I wonder why they aren't pub in the first place.
Probably some oversight by myself, or maybe even that i just recently switched to a different mindset, as in don't restrict libraries for the sake of encapsulation.
I would very much also like to have them pub.

Perhaps it would also not be a bad Idea to have some sort of utility functions directly on the Arch extern union, now i am not sure if they can be direct member functions, but maybe in the arch/arch.zig we can implement them regardless.
I kept the naming of those consistent, so maybe we can trick around with that.

I will probably checkout this pr this evening when i am out of work and try to see if i can manage to make it easier to access fields of the union, e.g. Operand

@SoraTenshi SoraTenshi linked an issue May 30, 2025 that may be closed by this pull request
@IamSanjid

IamSanjid commented May 30, 2025

Copy link
Copy Markdown
Contributor Author

Perhaps it would also not be a bad Idea to have some sort of utility functions directly on the Arch extern union, now i am not sure if they can be direct member functions, but maybe in the arch/arch.zig we can implement them regardless. I kept the naming of those consistent, so maybe we can trick around with that.

btw are we on the same page? just making sure kinda confused at my own text wasn't thinking it through. the main issue I was having is I couldn't define a function which accepts arch.x86.Arch directly :) like fn foo(x86: cs.arch.x86.Arch) all I just needed was to somehow access to those struct types, I think the capstone API is designed to have those type definition available all the places.

And!! I really think we don't need setup.zig instead a simple function in the impl.zig with accepting those function pointer should exist, those manual malloc, realloc feels not worth having.

@SoraTenshi

Copy link
Copy Markdown
Member

Perhaps it would also not be a bad Idea to have some sort of utility functions directly on the Arch extern union, now i am not sure if they can be direct member functions, but maybe in the arch/arch.zig we can implement them regardless. I kept the naming of those consistent, so maybe we can trick around with that.

btw are we on the same page? just making sure kinda confused at my own text wasn't thinking it through. the main issue I was having is I couldn't define a function which accepts arch.x86.Arch directly :) like fn foo(x86: cs.arch.x86.Arch) all I just needed was to somehow access to those struct types, I think the capstone API is designed to have those type definition available all the places.

Oh, it sounded like there were some usability issues, that's how i interpreted your text.
If it's just making those public, i am actually in favor of doing so. :)

And!! I really think we don't need setup.zig instead a simple function in the impl.zig with accepting those function pointer should exist, those manual malloc, realloc feels not worth having.

the point of setup.zig is to have a very zig-gy way of handling allocations.
Although it being implicit, which i cannot change because the library internally deals with that, it still gives you the choice to adopt an allocation strategy.
The reason those functions exist as well, is because of possible typing issues and to have a direct communication line between zig and c when it comes to allocations.
It's also a OPT-IN, so not much is lost with keeping it ;)

@IamSanjid

Copy link
Copy Markdown
Contributor Author

Oh, it sounded like there were some usability issues, that's how i interpreted your text. If it's just making those public, i am actually in favor of doing so. :)

owh cool let me do a commit so you can review if we can agree :).

the point of setup.zig is to have a very zig-gy way of handling allocations. Although it being implicit, which i cannot change because the library internally deals with that, it still gives you the choice to adopt an allocation strategy. The reason those functions exist as well, is because of possible typing issues and to have a direct communication line between zig and c when it comes to allocations. It's also a OPT-IN, so not much is lost with keeping it ;)

I mean everything is fine but the hash map :) I don't know can we follow something like this https://github.com/bernardassan/czalloc/blob/main/src/root.zig just against the hash map can agree with everything else :)

@SoraTenshi

Copy link
Copy Markdown
Member

Oh, it sounded like there were some usability issues, that's how i interpreted your text. If it's just making those public, i am actually in favor of doing so. :)

owh cool let me do a commit so you can review if we can agree :).

looks good!

the point of setup.zig is to have a very zig-gy way of handling allocations. Although it being implicit, which i cannot change because the library internally deals with that, it still gives you the choice to adopt an allocation strategy. The reason those functions exist as well, is because of possible typing issues and to have a direct communication line between zig and c when it comes to allocations. It's also a OPT-IN, so not much is lost with keeping it ;)

I mean everything is fine but the hash map :) I don't know can we follow something like this bernardassan/czalloc@main/src/root.zig just against the hash map can agree with everything else :)

Ah you don't like the extra allocations the hash map is doing i guess?
I'll think of a different way on how to handle that, maybe extra-metadata, but i'll experiment around. maybe not for this PR though.

@IamSanjid

Copy link
Copy Markdown
Contributor Author

looks good!

should we make all of these public too? https://github.com/Zig-Sec/capstone-bindings-zig/blob/main/src/arch/m680x/all.zig#L1
you can check arch/<>/all.zig many of the things not exposed, should we expose those?

@SoraTenshi

Copy link
Copy Markdown
Member

looks good!

should we make all of these public too? main/src/arch/m680x/all.zig#L1 you can check arch/<>/all.zig many of the things not exposed, should we expose those?

yeah i think it would make sense to expose everything that may be used.
There's no point to hide those things.

Comment thread src/impl.zig
cs.cs_free(@ptrCast(ins.ptr), ins.len);
/// Equivilent to cs_free
/// Only accepts `[]insn.Insn` or `*insn.Insn` types.
pub fn free(ins: anytype) void {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep it consistent (especially in terms of contracts) i would name it insn

Comment thread src/impl.zig Outdated
Comment thread src/impl.zig

@SoraTenshi SoraTenshi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with those last changes, we can merge it.
Also please undraft.

Thank you so much for the work you put in there! :)

Comment thread src/iter.zig Outdated
@IamSanjid IamSanjid marked this pull request as ready for review May 31, 2025 23:32

@SoraTenshi SoraTenshi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot!
This is looking fantastic!

@SoraTenshi

Copy link
Copy Markdown
Member

(i also took this pr to pull back up the ci, probably need to adjust all my other repos for this)

@SoraTenshi SoraTenshi merged commit 9452dcf into Zig-Sec:main Jun 1, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stabilize for Zig 0.14

2 participants