Improve AMS performance & benchmarking: semaphore fast-path, checksum, bench stats#72
Open
zebastian wants to merge 1 commit intonasa-jpl:integrationfrom
Open
Improve AMS performance & benchmarking: semaphore fast-path, checksum, bench stats#72zebastian wants to merge 1 commit intonasa-jpl:integrationfrom
zebastian wants to merge 1 commit intonasa-jpl:integrationfrom
Conversation
…, bench stats
- Inline semaphore hot path (_semTbl, _semGetSem, _semSync) and split
_semSync into inline fast-path + noinline slow-path to avoid function
call overhead on the common (seq-match) case
- Optimize computeAmsChecksum to process 2 bytes per iteration instead
of one-at-a-time with branch
- Defer keyBuffer initialization in recoverMsgContent to the branch
that actually uses it
- Add cell census log event in amsd to enable event-driven startup
instead of fixed sleep (speedup benchmark)
- Enhance amsbenchr with per-message inter-arrival timing, percentile
stats (p50/p95/p99), jitter, out-of-order detection, and min/max/avg
message size reporting
- Add variable message size support to amsbenchs (min/max range)
- Extract wait_for_log_event helper (bench_helpers) and refactor
ionstart scripts to wait for census log event instead of fixed 9s sleep
- Update dotest to use MESSAGE_SIZE_MIN/MAX variables
Note: some changes were developed with the help of AI (claude code),
especially the amsbenchr.* changes. I double checked execution time:
instructions and per function cost in kcachegrind.
ETA enhancement of possible throughput (or reduction in cpu time)
for continious flow of messages around 20-30%.
Benchmark results:
Received 1000 messages, a total of 1000000 bytes,in 0.310885 seconds.
3216.624 messages per second.
24.541 Mbps.
Message size: min=1000 max=1000 avg=1000 bytes.
Out-of-order: 0 of 1000 messages (0.00%).
Inter-arrival (us): min=80 max=13352 avg=311 p50=282 p95=356 p99=392
Mean jitter (us): 55.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: some changes were developed with the help of AI (claude code), especially the amsbenchr.* changes. I double checked execution time: instructions and per function cost in kcachegrind. ETA enhancement of possible throughput (or reduction in cpu time) for continious flow of messages around 20-30%.
Benchmark results:
Received 1000 messages, a total of 1000000 bytes,in 0.310885 seconds.
3216.624 messages per second.
24.541 Mbps.
Message size: min=1000 max=1000 avg=1000 bytes.
Out-of-order: 0 of 1000 messages (0.00%).
Inter-arrival (us): min=80 max=13352 avg=311 p50=282 p95=356 p99=392 Mean jitter (us): 55.0