A Poll-free, Low-latency Approach to Process State Capture and Recovery in Heterogeneous Computing Systems

P.P. Bungale, S. Sridhar, and V. Krishnamurthy (India)


Heterogeneous computing, Process state capture and recovery, Lowlatency capture initiation


A major issue of process state capture in heterogeneous computing systems is that it cannot simply be initiated instantaneously, once a request for capture has been received. This is because the capture can be initiated only at certain points – at points which have equivalent points in the other instances of the computation on different architectures – so that the process can be restarted at exactly the same point at which it was paused. For ensuring minimum latency, the state capture should be initiated at the very next point of equivalence encountered, once requested. At the same time, it should be ensured that the performance overhead incurred during normal execution should be kept at acceptable levels. This paper proposes a fundamentally new approach to process state capture and recovery which achieves the above objectives. In a polling approach, to achieve minimal latency (wait-time between capture request and actual initiation), poll-points would have to be placed at all potential points of equivalence. However, the performance overhead incurred due to polling during normal execution would reach severely unacceptable levels in this case. Our solution to the heterogeneous process state capture problem is fundamentally different in that it effectively enables all potential points of equivalence present in a computation, so that minimal latency is ensured.

