Currently when iwf server restarts, the state api will fail and wait for next attempt by the startToClose timeout + backoff retry interval.
If the startToClose timeout is very large (e.g. >10 mins), it will wait for a long time. To avoid the unnecessary waiting, Temporal/Cadence has a concept of "activity heartbeat" to tell Temporal/Cadence server that the worker is still alive. If no heartbeat is received within heartbeat timeout, Temporal/Cadence will reschedule next activity immediately based on backoff retry policy.
Note: this is also because of the fact that Temporal/Cadence activity task/worker is "polling based". iWF task/worker is "pushing" so it doesn't have such issues.
Need to add a side thread(gorotine) in the activity code:
go (){
sleep(10 mins)
activity.heartbeat()
}
^^ is simplified code. We also need to cancel the goroutine when the activity is finished (so need to use golang channel and timer), to avoid goroutine leaks.
Maybe make 10mins configurable.
Currently when iwf server restarts, the state api will fail and wait for next attempt by the startToClose timeout + backoff retry interval.
If the startToClose timeout is very large (e.g. >10 mins), it will wait for a long time. To avoid the unnecessary waiting, Temporal/Cadence has a concept of "activity heartbeat" to tell Temporal/Cadence server that the worker is still alive. If no heartbeat is received within heartbeat timeout, Temporal/Cadence will reschedule next activity immediately based on backoff retry policy.
Note: this is also because of the fact that Temporal/Cadence activity task/worker is "polling based". iWF task/worker is "pushing" so it doesn't have such issues.
Need to add a side thread(gorotine) in the activity code:
^^ is simplified code. We also need to cancel the goroutine when the activity is finished (so need to use golang channel and timer), to avoid goroutine leaks.
Maybe make 10mins configurable.