Skip to content

add an sha1_mb assembly implementation with vector extension for riscv64#177

Open
HeliC829 wants to merge 1 commit into
intel:masterfrom
HeliC829:rvv-sha1_mb
Open

add an sha1_mb assembly implementation with vector extension for riscv64#177
HeliC829 wants to merge 1 commit into
intel:masterfrom
HeliC829:rvv-sha1_mb

Conversation

@HeliC829

@HeliC829 HeliC829 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Origin C implementation:

multibinary_sha1_warm with 1 lanes: runtime =     219798 usecs, bandwidth 39 MB in 0.2198 sec = 186.35 MB/s
multibinary_sha1_warm with 10 lanes: runtime =    2199834 usecs, bandwidth 390 MB in 2.1998 sec = 186.20 MB/s
multibinary_sha1_warm with 11 lanes: runtime =    2422516 usecs, bandwidth 429 MB in 2.4225 sec = 185.99 MB/s
multibinary_sha1_warm with 12 lanes: runtime =    2643057 usecs, bandwidth 468 MB in 2.6431 sec = 185.97 MB/s
multibinary_sha1_warm with 13 lanes: runtime =    2862751 usecs, bandwidth 507 MB in 2.8628 sec = 186.00 MB/s
multibinary_sha1_warm with 14 lanes: runtime =    3083483 usecs, bandwidth 546 MB in 3.0835 sec = 185.97 MB/s
multibinary_sha1_warm with 15 lanes: runtime =    3304351 usecs, bandwidth 585 MB in 3.3044 sec = 185.94 MB/s
multibinary_sha1_warm with 16 lanes: runtime =    3524550 usecs, bandwidth 625 MB in 3.5246 sec = 185.94 MB/s
multibinary_sha1_warm with 2 lanes: runtime =     439534 usecs, bandwidth 78 MB in 0.4395 sec = 186.38 MB/s
multibinary_sha1_warm with 3 lanes: runtime =     659119 usecs, bandwidth 117 MB in 0.6591 sec = 186.43 MB/s
multibinary_sha1_warm with 4 lanes: runtime =     879014 usecs, bandwidth 156 MB in 0.8790 sec = 186.39 MB/s
multibinary_sha1_warm with 5 lanes: runtime =    1099112 usecs, bandwidth 195 MB in 1.0991 sec = 186.33 MB/s
multibinary_sha1_warm with 6 lanes: runtime =    1320764 usecs, bandwidth 234 MB in 1.3208 sec = 186.07 MB/s
multibinary_sha1_warm with 7 lanes: runtime =    1540503 usecs, bandwidth 273 MB in 1.5405 sec = 186.12 MB/s
multibinary_sha1_warm with 8 lanes: runtime =    1759994 usecs, bandwidth 312 MB in 1.7600 sec = 186.18 MB/s
multibinary_sha1_warm with 9 lanes: runtime =    1979825 usecs, bandwidth 351 MB in 1.9798 sec = 186.20 MB/s
multibinary_sha1_warm: runtime =    7062683 usecs, bandwidth 1250 MB in 7.0627 sec = 185.58 MB/s

Assembly implementation with vector extension :

multibinary_sha1_warm with 1 lanes: runtime =     187927 usecs, bandwidth 39 MB in 0.1879 sec = 217.96 MB/s
multibinary_sha1_warm with 10 lanes: runtime =    1194746 usecs, bandwidth 390 MB in 1.1947 sec = 342.83 MB/s
multibinary_sha1_warm with 11 lanes: runtime =    1231061 usecs, bandwidth 429 MB in 1.2311 sec = 365.99 MB/s
multibinary_sha1_warm with 12 lanes: runtime =    1232110 usecs, bandwidth 468 MB in 1.2321 sec = 398.93 MB/s
multibinary_sha1_warm with 13 lanes: runtime =    1420784 usecs, bandwidth 507 MB in 1.4208 sec = 374.78 MB/s
multibinary_sha1_warm with 14 lanes: runtime =    1609437 usecs, bandwidth 546 MB in 1.6094 sec = 356.30 MB/s
multibinary_sha1_warm with 15 lanes: runtime =    1645425 usecs, bandwidth 585 MB in 1.6454 sec = 373.40 MB/s
multibinary_sha1_warm with 16 lanes: runtime =    1646009 usecs, bandwidth 625 MB in 1.6460 sec = 398.15 MB/s
multibinary_sha1_warm with 2 lanes: runtime =     375322 usecs, bandwidth 78 MB in 0.3753 sec = 218.27 MB/s
multibinary_sha1_warm with 3 lanes: runtime =     407630 usecs, bandwidth 117 MB in 0.4076 sec = 301.45 MB/s
multibinary_sha1_warm with 4 lanes: runtime =     409043 usecs, bandwidth 156 MB in 0.4090 sec = 400.54 MB/s
multibinary_sha1_warm with 5 lanes: runtime =     596622 usecs, bandwidth 195 MB in 0.5966 sec = 343.27 MB/s
multibinary_sha1_warm with 6 lanes: runtime =     783394 usecs, bandwidth 234 MB in 0.7834 sec = 313.71 MB/s
multibinary_sha1_warm with 7 lanes: runtime =     815846 usecs, bandwidth 273 MB in 0.8158 sec = 351.44 MB/s
multibinary_sha1_warm with 8 lanes: runtime =     817024 usecs, bandwidth 312 MB in 0.8170 sec = 401.07 MB/s
multibinary_sha1_warm with 9 lanes: runtime =    1005674 usecs, bandwidth 351 MB in 1.0057 sec = 366.56 MB/s
multibinary_sha1_warm: runtime =    3321683 usecs, bandwidth 1250 MB in 3.3217 sec = 394.60 MB/s

@HeliC829 HeliC829 marked this pull request as ready for review April 30, 2026 07:17
… for riscv64

Signed-off-by: Julian Zhu <julian.oerv@isrc.iscas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant