[feature] integrate zeknox GPU-acceleration library into gnark#1332
[feature] integrate zeknox GPU-acceleration library into gnark#1332dloghin wants to merge 68 commits into
Conversation
|
@ivokub need your help and review! |
On it. Would it be possible to allow adding commits directly to the branch for easier review? |
Sure, I've add grant you push permission in https://github.com/okx/gnark/invitations Let me delete those examples to keep the PR clean |
|
I'm not able to create a proof for now, in the debug logs I see the last action is: �[90m14:05:05�[0m DBG Bs.MultiExp done �[36mMSMG2 5 took=�[0m0.86421 �[36macceleration=�[0mzeknox �[36mbackend=�[0mgroth16 �[36mcurve=�[0mbn254 �[36mnbConstraints=�[0m6I guess it is probably some deadlock somewhere. Have you been able to run end-to-end prover? |
|
Hi Ivo, May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries). On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working. |
I'm using AWS g4dn.xlarge instance which by documentation is T4. And it seems it is compute capability 7.5. Should it work if I compile the libraries myself? I started compiling them, but it took quite a bit of time and I didn't let it terminate. When I benchmarked previously, then g4dn was quite good balance between performance and $-per-proof cost. |
|
Yeah, compile by yourself should work. Compile BN254 MSM G2 takes ~5mins on our device. expect a long compile time |
Indeed I got it working and the speedup is similar to the one claimed in the PR (1.6x). I also had to build libblst. But now it seems that there is an issue with the proof, I get invalid proof: I could try looking into it, but it would probably take a bit time to compare the computed values against CPU execution - would it be possible to try out with another GPU and see if you hit the same problem? |
|
This is an edge case. We found this bug, tried many methods to fix it, but it still happens... |
|
Hi @ivokub, my latest commit fixes (temporarily) the issue with invalid proof. We observe that this issue appears in multi-GPU environments with relatively low frequency but we did not find the reason. If the proof is invalid, we recompute only the invalid points on CPU. We still observe 25-50% speedup even when this issue appears. Please review. Thank you. |
Thanks for the update. It takes a bit time to review more. We actually intend to make different proving backend support more modular, so I would like to get that done before. |
Description
This PR aims to integrate zeknox GPU-acceleration library into gnark. Specifically, this PR targets the GPU (NVIDIA CUDA) acceleration of groth16 backend over BN254. In addition, this PR adds a new example consisting of proving/verifying a batch of secp256r1 (P256) signatures. Our benchmarking shows 1.54-1.57X speedup of the CPU+GPU execution (with zeknox) compared to the default CPU-only execution.
In summary, we did the following addition:
backend/groth16/bn254/zeknoxfolder.backend/groth16/bn254/prove.goprinted in debug mode.examples/p256.README.mdon how to run gnark with zeknox.Type of change
How has this been tested?
We wrote new tests under
backend/groth16/bn254/zeknoxandexamples/p256. In addition, we also run tests underbackend/groth16/bn254.How has this been benchmarked?
We ran the P256 example to prove/verify a batch of 10 secp256r1 keys. The steps to run:
cd examples go build -tags zeknox ./examplesResults
The times below represent the proving time (in milliseconds) for 10 secp256r1 keys.
Checklist:
golangci-lintdoes not output errors locally