Introduction

This report documents the baseline and load tests against the AIDE. It shows comparisons of baseline and load tests across an AWS cloud environment (SIT) and performant on-premise Pre Prod environment. It also lists any conclusions and identifies any necessary follow-up actions.

Environment Details

AWS Cloud (SIT) Specification

Node	Specification
SIT-Head1	4 vCPU, 16GB ram, 0 GPUs
SIT-Head2	4 vCPU, 16GB ram, 0 GPUs
SIT-DGX	8 vCPUs, 32GB ram, 1 GPU's

On-premise Pre Prod Environment

Node	Specification
PreProd-Head1	48 vCPU, 252GB ram, 1 GPUs
PreProd-Head2	48 vCPU, 252GB ram, 0 GPUs
PreProd-Head3	48 vCPUs, 252GB ram, 1 GPU's

Data

Modality	Details
RF	- 1 slice - 1MB
US	- 7 slices - 17MB
MR	- 5 slices - 1MB
CT	- 324 slices - 167MB

Applications

The following dummy applications were published to stress the GPU and CPU. These were written using stress and gpu-burn

Application Name	Specification	Modality
Small	CPU: 2 GPU: Access to all RAM: 1GB Execution time: 10 seconds	RF
Medium	CPU: 8 GPU: Access to all RAM: 10GB Execution time: 30 seconds	US and MR
Large	CPU: 12 GPU: Access to all RAM: 16GB Execution time: 60 seconds	CT

Test Types

Baseline

Single transactions to performance reference point which can be used as a basis for performance comparison

Load Average

Realistic expected usage levels to determine its response time, resource usage, and reliability using GSTT imaging throughput data in an average 1 hour period.

Load Peak

Realistic expected usage levels to determine its response time, resource usage, and reliability using GSTT imaging throughput data in an peak 1 hour period.

Stress

Uplift of peak load by 25%

Throughput

Avg 1 Hour

Modality	Transactions	Model executions
X-ray	60	60
Ultrasound	28	2.8
CT	10	7
MRI	13	9.1

Peak 1 Hour

Modality	Transactions	Model executions
X-ray	120	120
Ultrasound	50	5
CT	30	21
MRI	25	17.5

Stress 1 Hour

Modality	Transactions	Model executions
X-ray	180	180
Ultrasound	75	7.5
CT	45	31.5
MRI	37.5	26.25

KPI and Measurements

KPI	Details	Query Params
DICOM Payload Processed	How long it took between an association being made to Informatics Gateway, the instances being saved to MinIO and a WorkflowRequestEvent being generated	ServiceName: Monai.Deploy.InformaticsGateway AND "Payload took"
Task Dispatched	How long it took for the WorkflowRequestEvent to be consumed by the WorkflowManager, a workflow to be triggered and a TaskDispatchEvent to be generated	ServiceName: Monai.Deploy.WorkflowManager AND messageDescription: WorkflowRequestEvent AND durationMilliseconds > 0
Task Created	How long it took for the TaskDispatchEvent to be consumed by the TaskManager and create a Task	ServiceName: Monai.Deploy.WorkflowManager.TaskManager AND messageType: TaskDispatchEvent AND durationMilliseconds > 0
Task Update	How long it took for the TaskManager to publish a TaskUpdateEvent, the WorkflowManager to consume the event and update the WorkflowInstance	ServiceName: Monai.Deploy.WorkflowManager AND messageDescription: TaskUpdateEvent AND durationMilliseconds > 0
Argo	How long it took for Argo to run the application requested. This includes time from the pod being scheduled and then a TaskCallbackEvent being published	Taken from Argo
End To End	Indicative time of the end to end processing of a workflow from dicom association to workflow completion.	Time from Task Update timestamp - (DICOM Payload Process timestamp - processed time)

Cloud Execution

Details

Baseline tests were executed on SIT to validate the cloud environment to compare pre-prod tests against to understand the performance improvements based on specifications.

Results

Baseline

Description

Send through the same study 5 times, with a 90 second gap to get average metric for a known study, environment and MAP (liver-seg) set up.

Metrics

	DICOM Payload Processed	DICOM Payload Processed	Task Dispatched	Task Dispatched	Task Created	Task Created	Task Update	Argo	Argo	Argo	End to End
Modality	Average	Max	Average	Max	Average	Max	Average	Max	Average (min)	Max (min)	Indicative
CT	01:11	01:24	14.5	20.4	2.3	2.9	0.8	1.7	01:57	02:04	03:21
MR	13.6	34.5	6.7	13.6	4.9	10	1	1.5	01:24	01:32	01:23
US	6.2	6.8	2.6	3.3	2.8	3.9	0.7	1.5	01:14	01:15	01:25
RF	5.8	9.5	11.3	23.6	30	107.7	1.1	2.9	01:06	01:37	00:58

On-Premise Execution

Details

Baseline, Load and Stress tests were executed on on-premise to understand the performance of MONAI-Deploy and AIDE on target production hardware and validate against throughput and metrics.

Results

Baseline 1

Description

Send through the same study 5 times, with a 90 second gap to get average metric for a known study, environment and MAP (liver-seg) set up.

Metrics

	DICOM Payload Processed	DICOM Payload Processed	Task Dispatched	Task Dispatched	Task Created	Task Created	Task Update	Task Update	Argo	Argo	End to End
Modality	Average (sec)	Max (sec)	Average (sec)	Max (sec)	Average (sec)	Max (sec)	Average (sec)	Max (sec)	Average (min)	Max (min)	Indicative
CT ("{{ context.dicom.series.all('0008','0060') }} == 'CT'")	34.4	36.2	11.6	12.5	1.1	1.2	0.4	0.9	01:54	02:07	02:30
CT ("{{ context.dicom.series.any('0008','0060') }} == 'CT'")	34.2	37.8	12.3	13.7	1.2	1.6	0.7	1	01:53	02:03	N/A
MR	1.1	1.4	0.7	1.1	1.2	1.5	0.6	0.8	01:06	01:10	01:18
US	1.7	2.3	1.1	1.3	0.9	1.6	0.6	1	01:10	01:17	01:07
RF	0.7	1.2	0.7	1.1	0.9	1.3	0.8	1	00:55	00:58	00:42
CT (executing Small app & no conditional logic)	34.9	37.9	10.6	10.8	2.08	6.3	0.9	1.9	01:06	01:13	01:47
RF (no conditional logic)	0.7	0.9	0.8	1.4	1.3	1.8	0.7	1.3	00:55	01:00	00:52

Baseline 2

Description

Retest of the MIG following a change to how it saved data to MinIO which resulted in ~ 50% reduction in time.

Metrics

	DICOM Payload Processed	DICOM Payload Processed
Modality	Average (sec)	Max (sec)
CT	14	16.2

Load (Avg)

Description

To evaluate the performance of MONAI Deploy under typical or average usage conditions for a large Trust. Please see Avg 1 Hour for details of volumetrics

Metrics

DICOM Payload Processed	DICOM Payload Processed	Task Dispatched	Task Dispatched	Task Created	Task Created	Task Updated	Task Updated
Average (sec)	Max (sec)	Average (sec)	Max (sec)	Average (sec)	Max (sec)	Average (sec)	Max (sec)
5.8	8.8	0.8	1.3	1.9	26.4	-	-

** DICOM Payload Processed was only tested with low instance studies. Bug to be raised which ensures that the async nature of the sending of DICOMs doesn't timeout after 60 seconds.

** Task updated metrics were not useful at high load as systems are very chatty, work needs to be done to differentiate useful messages better.

Failure/ Successful transactions

	Transactions	Errors	Details
Payload processed	101	0
Task Dispatched	88	52	WorkflowRequestRequeuePayloadProcessError was thrown 52 times for 14 payloads, this was due to the way the WorkflowManager was processing "Unable to locate a matching workflow for the given workflow request". It would seem that the WorkflowManager requeues WorkflowRequetEvents that do not match a Workflow AET and then throws an exception because that payload is already received and saved in Mongo. This does not affect functionality or performance
Task Created	88	0
Task Updated	-	-
Small-app	62	0
Medium-app	14	0
Large-app	12	0

FilesExpand file tree

results.md

Latest commit

History

results.md

File metadata and controls

Introduction

Environment Details

AWS Cloud (SIT) Specification

On-premise Pre Prod Environment

Data

Applications

Test Types

Baseline

Load Average

Load Peak

Stress

Throughput

Avg 1 Hour

Peak 1 Hour

Stress 1 Hour

KPI and Measurements

Cloud Execution

Details

Results

Baseline

Description

Metrics

On-Premise Execution

Details

Results

Baseline 1

Description

Metrics

Baseline 2

Description

Metrics

Load (Avg)

Description

Metrics

Failure/ Successful transactions

Load (Peak)

Description

Metrics

Stress

Description

Metrics