This book covers the most essential techniques for designing and building dependable distributed systems, from traditional fault tolerance to the blockchain technology. Topics include checkpointing and logging, recovery-orientated computing, replication, distributed consensus, Byzantine fault tolerance, as well as blockchain. This book intentionally includes traditional fault tolerance techniques so that readers can appreciate better the huge benefits brought by the blockchain technology and why it has been touted as a disruptive technology, some even regard it at the same level of the…mehr
This book covers the most essential techniques for designing and building dependable distributed systems, from traditional fault tolerance to the blockchain technology. Topics include checkpointing and logging, recovery-orientated computing, replication, distributed consensus, Byzantine fault tolerance, as well as blockchain.
This book intentionally includes traditional fault tolerance techniques so that readers can appreciate better the huge benefits brought by the blockchain technology and why it has been touted as a disruptive technology, some even regard it at the same level of the Internet. This book also expresses a grave concern on using traditional consensus algorithms in blockchain because with the limited scalability of such algorithms, the primary benefits of using blockchain in the first place, such as decentralization and immutability, could be easily lost under cyberattacks.Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Dr. Zhao received the PhD degree in Electrical and Computer Engineering from the University of California, Santa Barbara, in 2002. He is now a Full Professor in the Department of Electrical Engineering and Computer Science at Cleveland State University. He has more than 200 academic publications and three of his recent research papers in the dependable distributed computing area have won the best paper awards. Dr. Zhao also has two US utility patents and a patent application on blockchain under review.
Inhaltsangabe
List of Figures xiii
List of Tables xix
Acknowledgments xxi
Preface xxiii
References xxix
1 Introduction 1
1.1 Basic Concepts and Terminologies for Dependable Computing 2
1.1.1 System Models 2
1.1.2 Threat Models 3
1.1.3 Dependability Attributes and Evaluation Metrics 6
1.2 Means to Achieve Dependability 9
1.2.1 Fault Avoidance 9
1.2.2 Fault Detection and Diagnosis 9
1.2.3 Fault Removal 10
1.2.4 Fault Tolerance 11
1.3 System Security 13
References 18
2 Logging and Checkpointing 21
2.1 System Model 22
2.1.1 Fault Model 23
2.1.2 Process State and Global State 23
2.1.3 Piecewise Deterministic Assumption 26
2.1.4 Output Commit 26
2.1.5 Stable Storage 27
2.2 Checkpoint-Based Protocols 27
2.2.1 Uncoordinated Checkpointing 27
2.2.2 Tamir and Sequin Global Checkpointing Protocol 29
2.2.3 Chandy and Lamport Distributed Snapshot Protocol 35
2.2.4 Discussion 38
2.3 Log Based Protocols 40
2.3.1 Pessimistic Logging 42
2.3.2 Sender-Based Message Logging 51
References 60
3 Recovery-Oriented Computing 63
3.1 System Model 65
3.2 Fault Detection and Localization 68
3.2.1 Component Interactions Modeling and Anomaly Detection 72
3.2.2 Path Shapes Modeling and Root Cause Analysis 76
3.2.3 Inference-Based Fault Diagnosis 80
3.3 Microreboot 89
3.3.1 Microrebootable System Design Guideline 90
3.3.2 Automatic Recovery with Microreboot 91
3.3.3 Implications of the Microrebooting Technique 92
3.4 Overcoming Operator Errors 93
3.4.1 The Operator Undo Model 94
3.4.2 The Operator Undo Framework 95
References 99
4 Data and Service Replication 103
4.1 Service Replication 105
4.1.1 Replication Styles 107
4.1.2 Implementation of Service Replication 109
4.2 Data Replication 111
4.3 Optimistic Replication 116
4.3.1 System Models 117
4.3.2 Establish Ordering among Operations 119
4.3.3 State Transfer Systems 122
4.3.4 Operation Transfer System 126
4.3.5 Update Commitment 131
4.4 CAP Theorem 136
4.4.1 2 out 3 139
4.4.2 Implications of Enabling Partition Tolerance 140
References 143
5 Group Communication Systems 147
5.1 System Model 149
5.2 Sequencer Based Group Communication System 152
5.2.1 Normal Operation 153
5.2.2 Membership Change 157
5.2.3 Proof of Correctness 165
5.3 Sender Based Group Communication System 166
5.3.1 Total Ordering Protocol 167
5.3.2 Membership Change Protocol 174
5.3.3 Recovery Protocol 183
5.3.4 The Flow Control Mechanism 190
5.4 Vector Clock Based Group Communication System 192