Computational clusters have long provided a mechanism for the acceleration of high performance computing (HPC) applications. This book addresses the issue of fault-tolerance through checkpointing. It presents a general overview of checkpointing & how it's implemented.