This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).
The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution…mehr

Geräte: PC
ohne Kopierschutz
eBook Hilfe
Größe: 8.82MB

Andere Kunden interessierten sich auch für

High Performance Computing (eBook, PDF)

40,95 €
Tinghuai Chen
Fault Diagnosis and Fault Tolerance (eBook, PDF)

73,95 €
High Performance Computing (eBook, PDF)

40,95 €
High Performance Computing -- HiPC 2003 (eBook, PDF)

73,95 €
Katinka Wolter
Stochastic Models for Fault Tolerance (eBook, PDF)

40,95 €
High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (eBook, PDF)

40,95 €
Parallel Computing Technologies (eBook, PDF)

40,95 €

Produktbeschreibung

This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).

The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.

Topics and features:

Includes self-contained contributions from an international selection of preeminent experts
Provides a survey of resilience methods and performance models
Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction
Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface
Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach
Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption

This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.

Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.

Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in A, B, BG, CY, CZ, D, DK, EW, E, FIN, F, GR, HR, H, IRL, I, LT, L, LR, M, NL, PL, P, R, S, SLO, SK ausgeliefert werden.

Produktdetails

Produktdetails
Verlag: Springer International Publishing
Seitenzahl: 320
Erscheinungstermin: 1. Juli 2015
Englisch
ISBN-13: 9783319209432
Artikelnr.: 43791936

Produktdetails

Verlag: Springer International Publishing
Seitenzahl: 320
Erscheinungstermin: 1. Juli 2015
Englisch
ISBN-13: 9783319209432
Artikelnr.: 43791936

Herstellerkennzeichnung

Inhaltsangabe

Part I: General Overview.- Fault-Tolerance Techniques for High-Performance Computing.- Part II: Technical Contributions.- Errors and Faults.- Fault-Tolerant MPI.- Using Replication for Resilience on Exascale Systems.- Energy-Aware Check pointing Strategies.

Inhaltsangabe

Fault-Tolerance Techniques for High-Performance Computing (eBook, PDF)

2. tolino select Abo

Rechnungen

Retourenschein anfordern

Bestellstatus

Storno

Serviceseiten

Schließen

Fault-Tolerance Techniques for High-Performance Computing (eBook, PDF)

Fault-Tolerance Techniques for High-Performance Computing (eBook, PDF)

1. Login

2. tolino select Abo

Bitte wählen Sie Ihr Anliegen aus.

Rechnungen

Retourenschein anfordern

Bestellstatus

Storno

Serviceseiten

Schließen