Skip to content

Improve AMS performance & benchmarking: semaphore fast-path, checksum, bench stats#72

Open
zebastian wants to merge 1 commit intonasa-jpl:integrationfrom
zebastian:ams-optimizations
Open

Improve AMS performance & benchmarking: semaphore fast-path, checksum, bench stats#72
zebastian wants to merge 1 commit intonasa-jpl:integrationfrom
zebastian:ams-optimizations

Conversation

@zebastian
Copy link
Copy Markdown

  • Inline semaphore hot path (_semTbl, _semGetSem, _semSync) and split _semSync into inline fast-path + noinline slow-path to avoid function call overhead on the common (seq-match) case
  • Optimize computeAmsChecksum to process 2 bytes per iteration instead of one-at-a-time with branch
  • Defer keyBuffer initialization in recoverMsgContent to the branch that actually uses it
  • Add cell census log event in amsd to enable event-driven startup instead of fixed sleep (speedup benchmark)
  • Enhance amsbenchr with per-message inter-arrival timing, percentile stats (p50/p95/p99), jitter, out-of-order detection, and min/max/avg message size reporting
  • Add variable message size support to amsbenchs (min/max range)
  • Extract wait_for_log_event helper (bench_helpers) and refactor ionstart scripts to wait for census log event instead of fixed 9s sleep
  • Update dotest to use MESSAGE_SIZE_MIN/MAX variables

Note: some changes were developed with the help of AI (claude code), especially the amsbenchr.* changes. I double checked execution time: instructions and per function cost in kcachegrind. ETA enhancement of possible throughput (or reduction in cpu time) for continious flow of messages around 20-30%.

Benchmark results:
Received 1000 messages, a total of 1000000 bytes,in 0.310885 seconds.
3216.624 messages per second.
24.541 Mbps.
Message size: min=1000 max=1000 avg=1000 bytes.
Out-of-order: 0 of 1000 messages (0.00%).
Inter-arrival (us): min=80 max=13352 avg=311 p50=282 p95=356 p99=392 Mean jitter (us): 55.0

…, bench stats

- Inline semaphore hot path (_semTbl, _semGetSem, _semSync) and split
  _semSync into inline fast-path + noinline slow-path to avoid function
  call overhead on the common (seq-match) case
- Optimize computeAmsChecksum to process 2 bytes per iteration instead
  of one-at-a-time with branch
- Defer keyBuffer initialization in recoverMsgContent to the branch
  that actually uses it
- Add cell census log event in amsd to enable event-driven startup
  instead of fixed sleep (speedup benchmark)
- Enhance amsbenchr with per-message inter-arrival timing, percentile
  stats (p50/p95/p99), jitter, out-of-order detection, and min/max/avg
  message size reporting
- Add variable message size support to amsbenchs (min/max range)
- Extract wait_for_log_event helper (bench_helpers) and refactor
  ionstart scripts to wait for census log event instead of fixed 9s sleep
- Update dotest to use MESSAGE_SIZE_MIN/MAX variables

Note: some changes were developed with the help of AI (claude code),
especially the amsbenchr.* changes. I double checked execution time:
instructions and per function cost in kcachegrind.
ETA enhancement of possible throughput (or reduction in cpu time)
for continious flow of messages around 20-30%.

Benchmark results:
Received 1000 messages, a total of 1000000 bytes,in 0.310885 seconds.
  3216.624 messages per second.
    24.541 Mbps.
Message size: min=1000  max=1000  avg=1000 bytes.
Out-of-order: 0 of 1000 messages (0.00%).
Inter-arrival (us): min=80  max=13352  avg=311  p50=282  p95=356  p99=392
Mean jitter (us): 55.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant