This report documents the baseline and load tests against the AIDE. It shows comparisons of baseline and load tests across an AWS cloud environment (SIT) and performant on-premise Pre Prod environment. It also lists any conclusions and identifies any necessary follow-up actions.
| Node | Specification |
|---|---|
| SIT-Head1 | 4 vCPU, 16GB ram, 0 GPUs |
| SIT-Head2 | 4 vCPU, 16GB ram, 0 GPUs |
| SIT-DGX | 8 vCPUs, 32GB ram, 1 GPU's |
| Node | Specification |
|---|---|
| PreProd-Head1 | 48 vCPU, 252GB ram, 1 GPUs |
| PreProd-Head2 | 48 vCPU, 252GB ram, 0 GPUs |
| PreProd-Head3 | 48 vCPUs, 252GB ram, 1 GPU's |
| Modality | Details |
|---|---|
| RF | - 1 slice - 1MB |
| US | - 7 slices - 17MB |
| MR | - 5 slices - 1MB |
| CT | - 324 slices - 167MB |
The following dummy applications were published to stress the GPU and CPU. These were written using stress and gpu-burn
| Application Name | Specification | Modality |
|---|---|---|
| Small | CPU: 2 GPU: Access to all RAM: 1GB Execution time: 10 seconds |
RF |
| Medium | CPU: 8 GPU: Access to all RAM: 10GB Execution time: 30 seconds |
US and MR |
| Large | CPU: 12 GPU: Access to all RAM: 16GB Execution time: 60 seconds |
CT |
Single transactions to performance reference point which can be used as a basis for performance comparison
Realistic expected usage levels to determine its response time, resource usage, and reliability using GSTT imaging throughput data in an average 1 hour period.
Realistic expected usage levels to determine its response time, resource usage, and reliability using GSTT imaging throughput data in an peak 1 hour period.
Uplift of peak load by 25%
| Modality | Transactions | Model executions |
|---|---|---|
| X-ray | 60 | 60 |
| Ultrasound | 28 | 2.8 |
| CT | 10 | 7 |
| MRI | 13 | 9.1 |
| Modality | Transactions | Model executions |
|---|---|---|
| X-ray | 120 | 120 |
| Ultrasound | 50 | 5 |
| CT | 30 | 21 |
| MRI | 25 | 17.5 |
| Modality | Transactions | Model executions |
|---|---|---|
| X-ray | 180 | 180 |
| Ultrasound | 75 | 7.5 |
| CT | 45 | 31.5 |
| MRI | 37.5 | 26.25 |
| KPI | Details | Query Params |
|---|---|---|
| DICOM Payload Processed | How long it took between an association being made to Informatics Gateway, the instances being saved to MinIO and a WorkflowRequestEvent being generated | ServiceName: Monai.Deploy.InformaticsGateway AND "Payload took" |
| Task Dispatched | How long it took for the WorkflowRequestEvent to be consumed by the WorkflowManager, a workflow to be triggered and a TaskDispatchEvent to be generated | ServiceName: Monai.Deploy.WorkflowManager AND messageDescription: WorkflowRequestEvent AND durationMilliseconds > 0 |
| Task Created | How long it took for the TaskDispatchEvent to be consumed by the TaskManager and create a Task | ServiceName: Monai.Deploy.WorkflowManager.TaskManager AND messageType: TaskDispatchEvent AND durationMilliseconds > 0 |
| Task Update | How long it took for the TaskManager to publish a TaskUpdateEvent, the WorkflowManager to consume the event and update the WorkflowInstance | ServiceName: Monai.Deploy.WorkflowManager AND messageDescription: TaskUpdateEvent AND durationMilliseconds > 0 |
| Argo | How long it took for Argo to run the application requested. This includes time from the pod being scheduled and then a TaskCallbackEvent being published | Taken from Argo |
| End To End | Indicative time of the end to end processing of a workflow from dicom association to workflow completion. | Time from Task Update timestamp - (DICOM Payload Process timestamp - processed time) |
Baseline tests were executed on SIT to validate the cloud environment to compare pre-prod tests against to understand the performance improvements based on specifications.
Send through the same study 5 times, with a 90 second gap to get average metric for a known study, environment and MAP (liver-seg) set up.
| DICOM Payload Processed | DICOM Payload Processed | Task Dispatched | Task Dispatched | Task Created | Task Created | Task Update | Argo | Argo | Argo | End to End | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Modality | Average | Max | Average | Max | Average | Max | Average | Max | Average (min) | Max (min) | Indicative |
| CT | 01:11 | 01:24 | 14.5 | 20.4 | 2.3 | 2.9 | 0.8 | 1.7 | 01:57 | 02:04 | 03:21 |
| MR | 13.6 | 34.5 | 6.7 | 13.6 | 4.9 | 10 | 1 | 1.5 | 01:24 | 01:32 | 01:23 |
| US | 6.2 | 6.8 | 2.6 | 3.3 | 2.8 | 3.9 | 0.7 | 1.5 | 01:14 | 01:15 | 01:25 |
| RF | 5.8 | 9.5 | 11.3 | 23.6 | 30 | 107.7 | 1.1 | 2.9 | 01:06 | 01:37 | 00:58 |
Baseline, Load and Stress tests were executed on on-premise to understand the performance of MONAI-Deploy and AIDE on target production hardware and validate against throughput and metrics.
Send through the same study 5 times, with a 90 second gap to get average metric for a known study, environment and MAP (liver-seg) set up.
| DICOM Payload Processed | DICOM Payload Processed | Task Dispatched | Task Dispatched | Task Created | Task Created | Task Update | Task Update | Argo | Argo | End to End | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Modality | Average (sec) | Max (sec) | Average (sec) | Max (sec) | Average (sec) | Max (sec) | Average (sec) | Max (sec) | Average (min) | Max (min) | Indicative |
| CT ("{{ context.dicom.series.all('0008','0060') }} == 'CT'") | 34.4 | 36.2 | 11.6 | 12.5 | 1.1 | 1.2 | 0.4 | 0.9 | 01:54 | 02:07 | 02:30 |
| CT ("{{ context.dicom.series.any('0008','0060') }} == 'CT'") | 34.2 | 37.8 | 12.3 | 13.7 | 1.2 | 1.6 | 0.7 | 1 | 01:53 | 02:03 | N/A |
| MR | 1.1 | 1.4 | 0.7 | 1.1 | 1.2 | 1.5 | 0.6 | 0.8 | 01:06 | 01:10 | 01:18 |
| US | 1.7 | 2.3 | 1.1 | 1.3 | 0.9 | 1.6 | 0.6 | 1 | 01:10 | 01:17 | 01:07 |
| RF | 0.7 | 1.2 | 0.7 | 1.1 | 0.9 | 1.3 | 0.8 | 1 | 00:55 | 00:58 | 00:42 |
| CT (executing Small app & no conditional logic) | 34.9 | 37.9 | 10.6 | 10.8 | 2.08 | 6.3 | 0.9 | 1.9 | 01:06 | 01:13 | 01:47 |
| RF (no conditional logic) | 0.7 | 0.9 | 0.8 | 1.4 | 1.3 | 1.8 | 0.7 | 1.3 | 00:55 | 01:00 | 00:52 |
Retest of the MIG following a change to how it saved data to MinIO which resulted in ~ 50% reduction in time.
| DICOM Payload Processed | DICOM Payload Processed | |
|---|---|---|
| Modality | Average (sec) | Max (sec) |
| CT | 14 | 16.2 |
To evaluate the performance of MONAI Deploy under typical or average usage conditions for a large Trust. Please see Avg 1 Hour for details of volumetrics
| DICOM Payload Processed | DICOM Payload Processed | Task Dispatched | Task Dispatched | Task Created | Task Created | Task Updated | Task Updated |
|---|---|---|---|---|---|---|---|
| Average (sec) | Max (sec) | Average (sec) | Max (sec) | Average (sec) | Max (sec) | Average (sec) | Max (sec) |
| 5.8 | 8.8 | 0.8 | 1.3 | 1.9 | 26.4 | - | - |
** DICOM Payload Processed was only tested with low instance studies. Bug to be raised which ensures that the async nature of the sending of DICOMs doesn't timeout after 60 seconds.
** Task updated metrics were not useful at high load as systems are very chatty, work needs to be done to differentiate useful messages better.
| Transactions | Errors | Details | |
|---|---|---|---|
| Payload processed | 101 | 0 | |
| Task Dispatched | 88 | 52 | WorkflowRequestRequeuePayloadProcessError was thrown 52 times for 14 payloads, this was due to the way the WorkflowManager was processing "Unable to locate a matching workflow for the given workflow request". It would seem that the WorkflowManager requeues WorkflowRequetEvents that do not match a Workflow AET and then throws an exception because that payload is already received and saved in Mongo. This does not affect functionality or performance |
| Task Created | 88 | 0 | |
| Task Updated | - | - | |
| Small-app | 62 | 0 | |
| Medium-app | 14 | 0 | |
| Large-app | 12 | 0 |