The Impact of Laziness on the Performance of Snapshot Algorithms

Z. Liu and P.A.G. Sivilotti (USA)


distributed algorithms, snapshots, checkpointing


A snapshot algorithm gathers global state in a distributed system. Co-ordinated snapshot algorithms, such as the Chandy-Lamport algorithm, use control messages to en sure consistency of the gathered state. “Lazy snapshots” represent a generalization of Chandy-Lamport in which processes can delay recording local state and can antici pate or delay propagating control messages. In this paper, we present a new taxonomy of the class of lazy snapshot al gorithms. This taxonomy precisely characterizes each vari ant within this class and forms a framework for a system atic investigation of the performance of each algorithmic variant. Through simulation and analysis, we quantify the tradeoff between flexibility and storage cost. We find that eagerly sending markers before recording the local state al ways reduces the amount of storage required to save global state. Furthermore, this approach allows processes modest flexibility in deciding when to record the local state. Con versely, lazily recording the local state provides significant flexibility but can increase the amount of storage needed. In light of this tradeoff, we present a new hybrid algorithm that blends these strengths.

Important Links:

Go Back