An Efficient Place-based Coordinated Checkpointing Algorithm for Mobile Multi-Agent Systems

C.-Y. Yu, C.-M. Lin, K.-L. Liao, and C.-C. Yang (Taiwan)


Checkpointing, recovery, fault tolerance, mobile agents, and consistency.


Fault tolerance is a critical issue for mobile computing systems to ensure reliable execution. Checkpointing is the lowest-cost and simplest technique for achieving fault tolerance, preserving a consistent global snapshot. This study presents a mobile multi-agent system model and proposes a novel hybrid checkpointing protocol to provide an efficient fault tolerance scheme for this model. The hybrid checkpointing algorithm consists of place phase and agent phase. In place phase, a coordinated event is triggered to ensure that a consistent global checkpoint is taken. Furthermore, a communication-induced method is adopted to reduce synchronization overhead and prevent the production of unnecessary checkpoints in agent phase. Additionally, numerical results show that our hybrid checkpointing scheme, which has a low checkpointing overhead, is more efficient than the traditional scheme.

Important Links:

Go Back