- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
The next generation of computer system designers will be less concerned about details of processors and memories, and more concerned about the elements of a system tailored to particular applications. These designers will have a fundamental knowledge of processors and other elements in the system, but the success of their design will depend on the skills in making system-level tradeoffs that optimize the cost, performance and other attributes to meet application requirements. This book provides a new treatment of computer system design, particularly for System-on-Chip (SOC), which addresses…mehr
Andere Kunden interessierten sich auch für
- Mohamed RafiquzzamanMicroprocessor Theory and Applications with 68000/68020 and Pentium213,99 €
- Digital System Clocking186,99 €
- Chris J. MyersAsynchronous Circuit Design208,99 €
- Liming XiuVLSI Circuit Design Methodology121,99 €
- William LiuMosfet Models for Spice Simulation259,99 €
- Sami FranssilaIntroduction to Microfabrication82,99 €
- Franco MalobertiUnderstanding Microelectronics89,99 €
-
-
-
The next generation of computer system designers will be less concerned about details of processors and memories, and more concerned about the elements of a system tailored to particular applications. These designers will have a fundamental knowledge of processors and other elements in the system, but the success of their design will depend on the skills in making system-level tradeoffs that optimize the cost, performance and other attributes to meet application requirements. This book provides a new treatment of computer system design, particularly for System-on-Chip (SOC), which addresses the issues mentioned above. It begins with a global introduction, from the high-level view to the lowest common denominator (the chip itself), then moves on to the three main building blocks of an SOC (processor, memory, and interconnect). Next is an overview of what makes SOC unique (its customization ability and the applications that drive it). The final chapter presents future challenges for system design and SOC possibilities.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 360
- Erscheinungstermin: 11. Oktober 2011
- Englisch
- Abmessung: 240mm x 161mm x 24mm
- Gewicht: 707g
- ISBN-13: 9780470643365
- ISBN-10: 0470643366
- Artikelnr.: 32568301
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 360
- Erscheinungstermin: 11. Oktober 2011
- Englisch
- Abmessung: 240mm x 161mm x 24mm
- Gewicht: 707g
- ISBN-13: 9780470643365
- ISBN-10: 0470643366
- Artikelnr.: 32568301
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
Darcy Flynn is known for her heartwarming, sweet contemporary romances. Her refreshing storylines, irritatingly handsome heroes and feisty heroines will delight and entertain you from the first page to the last. Although published in music, art, and the Christian non-fiction market under another identity, it was the empty nest that turned Darcy to writing romantic fiction, proving it's never too late to follow your dreams. Darcy, a former Mrs. Tennessee, lives in Nashville with her husband.
Preface xiii
List of Abbreviations and Acronyms xvii
1 Introduction to the Systems Approach 1
1.1 System Architecture: An Overview 1
1.2 Components of the System: Processors, Memories, and Interconnects 2
1.3 Hardware and Software: Programmability Versus Performance 5
1.4 Processor Architectures 7
1.4.1 Processor: A Functional View 8
1.4.2 Processor: An Architectural View 9
1.5 Memory and Addressing 19
1.5.1 SOC Memory Examples 20
1.5.2 Addressing: The Architecture of Memory 21
1.5.3 Memory for SOC Operating System 22
1.6 System-Level Interconnection 24
1.6.1 Bus-Based Approach 24
1.6.2 Network-on-Chip Approach 25
1.7 An Approach for SOC Design 26
1.7.1 Requirements and Specifi cations 26
1.7.2 Design Iteration 27
1.8 System Architecture and Complexity 29
1.9 Product Economics and Implications for SOC 31
1.9.1 Factors Affecting Product Costs 31
1.9.2 Modeling Product Economics and Technology Complexity: The Lesson for
SOC 33
1.10 Dealing with Design Complexity 34
1.10.1 Buying IP 34
1.10.2 Reconfi guration 35
1.11 Conclusions 37
1.12 Problem Set 38
2 Chip Basics: Time, Area, Power, Reliability, and Confi gurability 39
2.1 Introduction 39
2.1.1 Design Trade-Offs 39
2.1.2 Requirements and Specifi cations 42
2.2 Cycle Time 43
2.2.1 Defi ning a Cycle 43
2.2.2 Optimum Pipeline 44
2.2.3 Performance 46
2.3 Die Area and Cost 47
2.3.1 Processor Area 47
2.3.2 Processor Subunits 50
2.4 Ideal and Practical Scaling 53
2.5 Power 57
2.6 Area-Time-Power Trade-Offs in Processor Design 60
2.6.1 Workstation Processor 60
2.6.2 Embedded Processor 61
2.7 Reliability 62
2.7.1 Dealing with Physical Faults 62
2.7.2 Error Detection and Correction 65
2.7.3 Dealing with Manufacturing Faults 68
2.7.4 Memory and Function Scrubbing 69
2.8 Confi gurability 69
2.8.1 Why Reconfi gurable Design? 69
2.8.2 Area Estimate of Reconfi gurable Devices 70
2.9 Conclusion 71
2.10 Problem Set 71
3 Processors 74
3.1 Introduction 74
3.2 Processor Selection for SOC 76
3.2.1 Overview 76
3.2.2 Example: Soft Processors 76
3.2.3 Examples: Processor Core Selection 79
3.3 Basic Concepts in Processor Architecture 81
3.3.1 Instruction Set 81
3.3.2 Some Instruction Set Conventions 82
3.3.3 Branches 82
3.3.4 Interrupts and Exceptions 84
3.4 Basic Concepts in Processor Microarchitecture 86
3.5 Basic Elements in Instruction Handling 88
3.5.1 The Instruction Decoder and Interlocks 88
3.5.2 Bypassing 90
3.5.3 Execution Unit 90
3.6 Buffers: Minimizing Pipeline Delays 91
3.6.1 Mean Request Rate Buffers 91
3.6.2 Buffers Designed for a Fixed or Maximum Request Rate 92
3.7 Branches: Reducing the Cost of Branches 93
3.7.1 Branch Target Capture: Branch Target Buffers (BTBs) 94
3.7.2 Branch Prediction 97
3.8 More Robust Processors: Vector, Very Long Instruction Word (VLIW), and
Superscalar 101
3.9 Vector Processors and Vector Instruction Extensions 101
3.9.1 Vector Functional Units 103
3.10 VLIW Processors 107
3.11 Superscalar Processors 108
3.11.1 Data Dependencies 109
3.11.2 Detecting Instruction Concurrency 110
3.11.3 A Simple Implementation 112
3.11.4 Preserving State with Out-of-Order Execution 116
3.12 Processor Evolution and Two Examples 118
3.12.1 Soft and Firm Processor Designs: The Processor as IP 118
3.12.2 High-Performance, Custom-Designed Processors 118
3.13 Conclusions 119
3.14 Problem Set 120
4 Memory Design: System-on-Chip and Board-Based Systems 123
4.1 Introduction 123
4.2 Overview 125
4.2.1 SOC External Memory: Flash 125
4.2.2 SOC Internal Memory: Placement 126
4.2.3 The Size of Memory 127
4.3 Scratchpads and Cache Memory 128
4.4 Basic Notions 129
4.5 Cache Organization 130
4.6 Cache Data 133
4.7 Write Policies 134
4.8 Strategies for Line Replacement at Miss Time 135
4.8.1 Fetching a Line 136
4.8.2 Line Replacement 136
4.8.3 Cache Environment: Effects of System, Transactions, and
Multiprogramming 137
4.9 Other Types of Cache 138
4.10 Split I- and D-Caches and the Effect of Code Density 138
4.11 Multilevel Caches 139
4.11.1 Limits on Cache Array Size 139
4.11.2 Evaluating Multilevel Caches 140
4.11.3 Logical Inclusion 143
4.12 Virtual-to-Real Translation 143
4.13 SOC (On-Die) Memory Systems 145
4.14 Board-based (Off-Die) Memory Systems 147
4.15 Simple DRAM and the Memory Array 149
4.15.1 SDRAM and DDR SDRAM 152
4.15.2 Memory Buffers 156
4.16 Models of Simple Processor-Memory Interaction 156
4.16.1 Models of Multiple Simple Processors and Memory 157
4.16.2 The Strecker-Ravi Model 158
4.16.3 Interleaved Caches 160
4.17 Conclusions 161
4.18 Problem Set 161
5 Interconnect 165
5.1 Introduction 165
5.2 Overview: Interconnect Architectures 166
5.3 Bus: Basic Architecture 168
5.3.1 Arbitration and Protocols 170
5.3.2 Bus Bridge 171
5.3.3 Physical Bus Structure 171
5.3.4 Bus Varieties 172
5.4 SOC Standard Buses 173
5.4.1 AMBA 174
5.4.2 CoreConnect 177
5.4.3 Bus Interface Units: Bus Sockets and Bus Wrappers 179
5.5 Analytic Bus Models 183
5.5.1 Contention and Shared Bus 183
5.5.2 Simple Bus Model: Without Resubmission 184
5.5.3 Bus Model with Request Resubmission 185
5.5.4 Using the Bus Model: Computing the Offered Occupancy 185
5.5.5 Effect of Bus Transactions and Contention Time 186
5.6 Beyond the Bus: NOC with Switch Interconnects 187
5.6.1 Static Networks 190
5.6.2 Dynamic Networks 192
5.7 Some NOC Switch Examples 194
5.7.1 A 2-D Grid Example of Direct Networks 194
5.7.2 Asynchronous Crossbar Interconnect for Synchronous SOC (Dynamic
Network) 196
5.7.3 Blocking versus Nonblocking 197
5.8 Layered Architecture and Network Interface Unit 197
5.8.1 NOC Layered Architecture 198
5.8.2 NOC and NIU Example 200
5.8.3 Bus versus NOC 201
5.9 Evaluating Interconnect Networks 201
5.9.1 Static versus Dynamic Networks 202
5.9.2 Comparing Networks: Example 204
5.10 Conclusions 205
5.11 Problem Set 206
6 Customization and Confi gurability 208
6.1 Introduction 208
6.2 Estimating Effectiveness of Customization 209
6.3 SOC Customization: An Overview 210
6.4 Customizing Instruction Processors 212
6.4.1 Processor Customization Approaches 214
6.4.2 Architecture Description 215
6.4.3 Identifying Custom Instructions Automatically 217
6.5 Reconfi gurable Technologies 218
6.5.1 Reconfi gurable Functional Units (FUs) 218
6.5.2 Reconfi gurable Interconnects 222
6.5.3 Software Confi gurable Processors 224
6.6 Mapping Designs Onto Reconfi gurable Devices 226
6.7 Instance-Specifi c Design 228
6.8 Customizable Soft Processor: An Example 231
6.9 Reconfi guration 235
6.9.1 Reconfi guration Overhead Analysis 235
6.9.2 Trade-Off Analysis: Reconfi gurable Parallelism 237
6.10 Conclusions 242
6.11 Problem Set 243
7 Application Studies 246
7.1 Introduction 246
7.2 SOC Design Approach 246
7.3 Application Study: AES 251
7.3.1 AES: Algorithm and Requirements 251
7.3.2 AES: Design and Evaluation 253
7.4 Application Study: 3-D Graphics Processors 254
7.4.1 Analysis: Processing 255
7.4.2 Analysis: Interconnection 259
7.4.3 Prototyping 260
7.5 Application Study: Image Compression 262
7.5.1 JPEG Compression 262
7.5.2 Example JPEG System for Digital Still Camera 264
7.6 Application Study: Video Compression 266
7.6.1 MPEG and H.26X Video Compression: Requirements 268
7.6.2 H.264 Acceleration: Designs 271
7.7 Further Application Studies 276
7.7.1 MP3 Audio Decoding 276
7.7.2 Software-Defi ned Radio with 802.16 279
7.8 Conclusions 281
7.9 Problem Set 282
8 What's Next: Challenges Ahead 285
8.1 Introduction 285
8.2 Overview 286
8.3 Technology 288
8.4 Powering the ASOC 289
8.5 The Shape of the ASOC 292
8.6 Computer Module and Memory 293
8.7 RF or Light Communications 293
8.7.1 Lasers 294
8.7.2 RF 295
8.7.3 Potential for Laser/RF Communications 295
8.7.4 Networked ASOC 296
8.8 Sensing 296
8.8.1 Visual 296
8.8.2 Audio 297
8.9 Motion, Flight, and the Fruit Fly 298
8.10 Motivation 299
8.11 Overview 300
8.12 Pre-Deployment 302
8.13 Post-Deployment 307
8.13.1 Situation-Specifi c Optimization 308
8.13.2 Autonomous Optimization Control 309
8.14 Roadmap and Challenges 310
8.15 Summary 312
Appendix: Tools for Processor Evaluation 313
References 316
Index 329
List of Abbreviations and Acronyms xvii
1 Introduction to the Systems Approach 1
1.1 System Architecture: An Overview 1
1.2 Components of the System: Processors, Memories, and Interconnects 2
1.3 Hardware and Software: Programmability Versus Performance 5
1.4 Processor Architectures 7
1.4.1 Processor: A Functional View 8
1.4.2 Processor: An Architectural View 9
1.5 Memory and Addressing 19
1.5.1 SOC Memory Examples 20
1.5.2 Addressing: The Architecture of Memory 21
1.5.3 Memory for SOC Operating System 22
1.6 System-Level Interconnection 24
1.6.1 Bus-Based Approach 24
1.6.2 Network-on-Chip Approach 25
1.7 An Approach for SOC Design 26
1.7.1 Requirements and Specifi cations 26
1.7.2 Design Iteration 27
1.8 System Architecture and Complexity 29
1.9 Product Economics and Implications for SOC 31
1.9.1 Factors Affecting Product Costs 31
1.9.2 Modeling Product Economics and Technology Complexity: The Lesson for
SOC 33
1.10 Dealing with Design Complexity 34
1.10.1 Buying IP 34
1.10.2 Reconfi guration 35
1.11 Conclusions 37
1.12 Problem Set 38
2 Chip Basics: Time, Area, Power, Reliability, and Confi gurability 39
2.1 Introduction 39
2.1.1 Design Trade-Offs 39
2.1.2 Requirements and Specifi cations 42
2.2 Cycle Time 43
2.2.1 Defi ning a Cycle 43
2.2.2 Optimum Pipeline 44
2.2.3 Performance 46
2.3 Die Area and Cost 47
2.3.1 Processor Area 47
2.3.2 Processor Subunits 50
2.4 Ideal and Practical Scaling 53
2.5 Power 57
2.6 Area-Time-Power Trade-Offs in Processor Design 60
2.6.1 Workstation Processor 60
2.6.2 Embedded Processor 61
2.7 Reliability 62
2.7.1 Dealing with Physical Faults 62
2.7.2 Error Detection and Correction 65
2.7.3 Dealing with Manufacturing Faults 68
2.7.4 Memory and Function Scrubbing 69
2.8 Confi gurability 69
2.8.1 Why Reconfi gurable Design? 69
2.8.2 Area Estimate of Reconfi gurable Devices 70
2.9 Conclusion 71
2.10 Problem Set 71
3 Processors 74
3.1 Introduction 74
3.2 Processor Selection for SOC 76
3.2.1 Overview 76
3.2.2 Example: Soft Processors 76
3.2.3 Examples: Processor Core Selection 79
3.3 Basic Concepts in Processor Architecture 81
3.3.1 Instruction Set 81
3.3.2 Some Instruction Set Conventions 82
3.3.3 Branches 82
3.3.4 Interrupts and Exceptions 84
3.4 Basic Concepts in Processor Microarchitecture 86
3.5 Basic Elements in Instruction Handling 88
3.5.1 The Instruction Decoder and Interlocks 88
3.5.2 Bypassing 90
3.5.3 Execution Unit 90
3.6 Buffers: Minimizing Pipeline Delays 91
3.6.1 Mean Request Rate Buffers 91
3.6.2 Buffers Designed for a Fixed or Maximum Request Rate 92
3.7 Branches: Reducing the Cost of Branches 93
3.7.1 Branch Target Capture: Branch Target Buffers (BTBs) 94
3.7.2 Branch Prediction 97
3.8 More Robust Processors: Vector, Very Long Instruction Word (VLIW), and
Superscalar 101
3.9 Vector Processors and Vector Instruction Extensions 101
3.9.1 Vector Functional Units 103
3.10 VLIW Processors 107
3.11 Superscalar Processors 108
3.11.1 Data Dependencies 109
3.11.2 Detecting Instruction Concurrency 110
3.11.3 A Simple Implementation 112
3.11.4 Preserving State with Out-of-Order Execution 116
3.12 Processor Evolution and Two Examples 118
3.12.1 Soft and Firm Processor Designs: The Processor as IP 118
3.12.2 High-Performance, Custom-Designed Processors 118
3.13 Conclusions 119
3.14 Problem Set 120
4 Memory Design: System-on-Chip and Board-Based Systems 123
4.1 Introduction 123
4.2 Overview 125
4.2.1 SOC External Memory: Flash 125
4.2.2 SOC Internal Memory: Placement 126
4.2.3 The Size of Memory 127
4.3 Scratchpads and Cache Memory 128
4.4 Basic Notions 129
4.5 Cache Organization 130
4.6 Cache Data 133
4.7 Write Policies 134
4.8 Strategies for Line Replacement at Miss Time 135
4.8.1 Fetching a Line 136
4.8.2 Line Replacement 136
4.8.3 Cache Environment: Effects of System, Transactions, and
Multiprogramming 137
4.9 Other Types of Cache 138
4.10 Split I- and D-Caches and the Effect of Code Density 138
4.11 Multilevel Caches 139
4.11.1 Limits on Cache Array Size 139
4.11.2 Evaluating Multilevel Caches 140
4.11.3 Logical Inclusion 143
4.12 Virtual-to-Real Translation 143
4.13 SOC (On-Die) Memory Systems 145
4.14 Board-based (Off-Die) Memory Systems 147
4.15 Simple DRAM and the Memory Array 149
4.15.1 SDRAM and DDR SDRAM 152
4.15.2 Memory Buffers 156
4.16 Models of Simple Processor-Memory Interaction 156
4.16.1 Models of Multiple Simple Processors and Memory 157
4.16.2 The Strecker-Ravi Model 158
4.16.3 Interleaved Caches 160
4.17 Conclusions 161
4.18 Problem Set 161
5 Interconnect 165
5.1 Introduction 165
5.2 Overview: Interconnect Architectures 166
5.3 Bus: Basic Architecture 168
5.3.1 Arbitration and Protocols 170
5.3.2 Bus Bridge 171
5.3.3 Physical Bus Structure 171
5.3.4 Bus Varieties 172
5.4 SOC Standard Buses 173
5.4.1 AMBA 174
5.4.2 CoreConnect 177
5.4.3 Bus Interface Units: Bus Sockets and Bus Wrappers 179
5.5 Analytic Bus Models 183
5.5.1 Contention and Shared Bus 183
5.5.2 Simple Bus Model: Without Resubmission 184
5.5.3 Bus Model with Request Resubmission 185
5.5.4 Using the Bus Model: Computing the Offered Occupancy 185
5.5.5 Effect of Bus Transactions and Contention Time 186
5.6 Beyond the Bus: NOC with Switch Interconnects 187
5.6.1 Static Networks 190
5.6.2 Dynamic Networks 192
5.7 Some NOC Switch Examples 194
5.7.1 A 2-D Grid Example of Direct Networks 194
5.7.2 Asynchronous Crossbar Interconnect for Synchronous SOC (Dynamic
Network) 196
5.7.3 Blocking versus Nonblocking 197
5.8 Layered Architecture and Network Interface Unit 197
5.8.1 NOC Layered Architecture 198
5.8.2 NOC and NIU Example 200
5.8.3 Bus versus NOC 201
5.9 Evaluating Interconnect Networks 201
5.9.1 Static versus Dynamic Networks 202
5.9.2 Comparing Networks: Example 204
5.10 Conclusions 205
5.11 Problem Set 206
6 Customization and Confi gurability 208
6.1 Introduction 208
6.2 Estimating Effectiveness of Customization 209
6.3 SOC Customization: An Overview 210
6.4 Customizing Instruction Processors 212
6.4.1 Processor Customization Approaches 214
6.4.2 Architecture Description 215
6.4.3 Identifying Custom Instructions Automatically 217
6.5 Reconfi gurable Technologies 218
6.5.1 Reconfi gurable Functional Units (FUs) 218
6.5.2 Reconfi gurable Interconnects 222
6.5.3 Software Confi gurable Processors 224
6.6 Mapping Designs Onto Reconfi gurable Devices 226
6.7 Instance-Specifi c Design 228
6.8 Customizable Soft Processor: An Example 231
6.9 Reconfi guration 235
6.9.1 Reconfi guration Overhead Analysis 235
6.9.2 Trade-Off Analysis: Reconfi gurable Parallelism 237
6.10 Conclusions 242
6.11 Problem Set 243
7 Application Studies 246
7.1 Introduction 246
7.2 SOC Design Approach 246
7.3 Application Study: AES 251
7.3.1 AES: Algorithm and Requirements 251
7.3.2 AES: Design and Evaluation 253
7.4 Application Study: 3-D Graphics Processors 254
7.4.1 Analysis: Processing 255
7.4.2 Analysis: Interconnection 259
7.4.3 Prototyping 260
7.5 Application Study: Image Compression 262
7.5.1 JPEG Compression 262
7.5.2 Example JPEG System for Digital Still Camera 264
7.6 Application Study: Video Compression 266
7.6.1 MPEG and H.26X Video Compression: Requirements 268
7.6.2 H.264 Acceleration: Designs 271
7.7 Further Application Studies 276
7.7.1 MP3 Audio Decoding 276
7.7.2 Software-Defi ned Radio with 802.16 279
7.8 Conclusions 281
7.9 Problem Set 282
8 What's Next: Challenges Ahead 285
8.1 Introduction 285
8.2 Overview 286
8.3 Technology 288
8.4 Powering the ASOC 289
8.5 The Shape of the ASOC 292
8.6 Computer Module and Memory 293
8.7 RF or Light Communications 293
8.7.1 Lasers 294
8.7.2 RF 295
8.7.3 Potential for Laser/RF Communications 295
8.7.4 Networked ASOC 296
8.8 Sensing 296
8.8.1 Visual 296
8.8.2 Audio 297
8.9 Motion, Flight, and the Fruit Fly 298
8.10 Motivation 299
8.11 Overview 300
8.12 Pre-Deployment 302
8.13 Post-Deployment 307
8.13.1 Situation-Specifi c Optimization 308
8.13.2 Autonomous Optimization Control 309
8.14 Roadmap and Challenges 310
8.15 Summary 312
Appendix: Tools for Processor Evaluation 313
References 316
Index 329
Preface xiii
List of Abbreviations and Acronyms xvii
1 Introduction to the Systems Approach 1
1.1 System Architecture: An Overview 1
1.2 Components of the System: Processors, Memories, and Interconnects 2
1.3 Hardware and Software: Programmability Versus Performance 5
1.4 Processor Architectures 7
1.4.1 Processor: A Functional View 8
1.4.2 Processor: An Architectural View 9
1.5 Memory and Addressing 19
1.5.1 SOC Memory Examples 20
1.5.2 Addressing: The Architecture of Memory 21
1.5.3 Memory for SOC Operating System 22
1.6 System-Level Interconnection 24
1.6.1 Bus-Based Approach 24
1.6.2 Network-on-Chip Approach 25
1.7 An Approach for SOC Design 26
1.7.1 Requirements and Specifi cations 26
1.7.2 Design Iteration 27
1.8 System Architecture and Complexity 29
1.9 Product Economics and Implications for SOC 31
1.9.1 Factors Affecting Product Costs 31
1.9.2 Modeling Product Economics and Technology Complexity: The Lesson for
SOC 33
1.10 Dealing with Design Complexity 34
1.10.1 Buying IP 34
1.10.2 Reconfi guration 35
1.11 Conclusions 37
1.12 Problem Set 38
2 Chip Basics: Time, Area, Power, Reliability, and Confi gurability 39
2.1 Introduction 39
2.1.1 Design Trade-Offs 39
2.1.2 Requirements and Specifi cations 42
2.2 Cycle Time 43
2.2.1 Defi ning a Cycle 43
2.2.2 Optimum Pipeline 44
2.2.3 Performance 46
2.3 Die Area and Cost 47
2.3.1 Processor Area 47
2.3.2 Processor Subunits 50
2.4 Ideal and Practical Scaling 53
2.5 Power 57
2.6 Area-Time-Power Trade-Offs in Processor Design 60
2.6.1 Workstation Processor 60
2.6.2 Embedded Processor 61
2.7 Reliability 62
2.7.1 Dealing with Physical Faults 62
2.7.2 Error Detection and Correction 65
2.7.3 Dealing with Manufacturing Faults 68
2.7.4 Memory and Function Scrubbing 69
2.8 Confi gurability 69
2.8.1 Why Reconfi gurable Design? 69
2.8.2 Area Estimate of Reconfi gurable Devices 70
2.9 Conclusion 71
2.10 Problem Set 71
3 Processors 74
3.1 Introduction 74
3.2 Processor Selection for SOC 76
3.2.1 Overview 76
3.2.2 Example: Soft Processors 76
3.2.3 Examples: Processor Core Selection 79
3.3 Basic Concepts in Processor Architecture 81
3.3.1 Instruction Set 81
3.3.2 Some Instruction Set Conventions 82
3.3.3 Branches 82
3.3.4 Interrupts and Exceptions 84
3.4 Basic Concepts in Processor Microarchitecture 86
3.5 Basic Elements in Instruction Handling 88
3.5.1 The Instruction Decoder and Interlocks 88
3.5.2 Bypassing 90
3.5.3 Execution Unit 90
3.6 Buffers: Minimizing Pipeline Delays 91
3.6.1 Mean Request Rate Buffers 91
3.6.2 Buffers Designed for a Fixed or Maximum Request Rate 92
3.7 Branches: Reducing the Cost of Branches 93
3.7.1 Branch Target Capture: Branch Target Buffers (BTBs) 94
3.7.2 Branch Prediction 97
3.8 More Robust Processors: Vector, Very Long Instruction Word (VLIW), and
Superscalar 101
3.9 Vector Processors and Vector Instruction Extensions 101
3.9.1 Vector Functional Units 103
3.10 VLIW Processors 107
3.11 Superscalar Processors 108
3.11.1 Data Dependencies 109
3.11.2 Detecting Instruction Concurrency 110
3.11.3 A Simple Implementation 112
3.11.4 Preserving State with Out-of-Order Execution 116
3.12 Processor Evolution and Two Examples 118
3.12.1 Soft and Firm Processor Designs: The Processor as IP 118
3.12.2 High-Performance, Custom-Designed Processors 118
3.13 Conclusions 119
3.14 Problem Set 120
4 Memory Design: System-on-Chip and Board-Based Systems 123
4.1 Introduction 123
4.2 Overview 125
4.2.1 SOC External Memory: Flash 125
4.2.2 SOC Internal Memory: Placement 126
4.2.3 The Size of Memory 127
4.3 Scratchpads and Cache Memory 128
4.4 Basic Notions 129
4.5 Cache Organization 130
4.6 Cache Data 133
4.7 Write Policies 134
4.8 Strategies for Line Replacement at Miss Time 135
4.8.1 Fetching a Line 136
4.8.2 Line Replacement 136
4.8.3 Cache Environment: Effects of System, Transactions, and
Multiprogramming 137
4.9 Other Types of Cache 138
4.10 Split I- and D-Caches and the Effect of Code Density 138
4.11 Multilevel Caches 139
4.11.1 Limits on Cache Array Size 139
4.11.2 Evaluating Multilevel Caches 140
4.11.3 Logical Inclusion 143
4.12 Virtual-to-Real Translation 143
4.13 SOC (On-Die) Memory Systems 145
4.14 Board-based (Off-Die) Memory Systems 147
4.15 Simple DRAM and the Memory Array 149
4.15.1 SDRAM and DDR SDRAM 152
4.15.2 Memory Buffers 156
4.16 Models of Simple Processor-Memory Interaction 156
4.16.1 Models of Multiple Simple Processors and Memory 157
4.16.2 The Strecker-Ravi Model 158
4.16.3 Interleaved Caches 160
4.17 Conclusions 161
4.18 Problem Set 161
5 Interconnect 165
5.1 Introduction 165
5.2 Overview: Interconnect Architectures 166
5.3 Bus: Basic Architecture 168
5.3.1 Arbitration and Protocols 170
5.3.2 Bus Bridge 171
5.3.3 Physical Bus Structure 171
5.3.4 Bus Varieties 172
5.4 SOC Standard Buses 173
5.4.1 AMBA 174
5.4.2 CoreConnect 177
5.4.3 Bus Interface Units: Bus Sockets and Bus Wrappers 179
5.5 Analytic Bus Models 183
5.5.1 Contention and Shared Bus 183
5.5.2 Simple Bus Model: Without Resubmission 184
5.5.3 Bus Model with Request Resubmission 185
5.5.4 Using the Bus Model: Computing the Offered Occupancy 185
5.5.5 Effect of Bus Transactions and Contention Time 186
5.6 Beyond the Bus: NOC with Switch Interconnects 187
5.6.1 Static Networks 190
5.6.2 Dynamic Networks 192
5.7 Some NOC Switch Examples 194
5.7.1 A 2-D Grid Example of Direct Networks 194
5.7.2 Asynchronous Crossbar Interconnect for Synchronous SOC (Dynamic
Network) 196
5.7.3 Blocking versus Nonblocking 197
5.8 Layered Architecture and Network Interface Unit 197
5.8.1 NOC Layered Architecture 198
5.8.2 NOC and NIU Example 200
5.8.3 Bus versus NOC 201
5.9 Evaluating Interconnect Networks 201
5.9.1 Static versus Dynamic Networks 202
5.9.2 Comparing Networks: Example 204
5.10 Conclusions 205
5.11 Problem Set 206
6 Customization and Confi gurability 208
6.1 Introduction 208
6.2 Estimating Effectiveness of Customization 209
6.3 SOC Customization: An Overview 210
6.4 Customizing Instruction Processors 212
6.4.1 Processor Customization Approaches 214
6.4.2 Architecture Description 215
6.4.3 Identifying Custom Instructions Automatically 217
6.5 Reconfi gurable Technologies 218
6.5.1 Reconfi gurable Functional Units (FUs) 218
6.5.2 Reconfi gurable Interconnects 222
6.5.3 Software Confi gurable Processors 224
6.6 Mapping Designs Onto Reconfi gurable Devices 226
6.7 Instance-Specifi c Design 228
6.8 Customizable Soft Processor: An Example 231
6.9 Reconfi guration 235
6.9.1 Reconfi guration Overhead Analysis 235
6.9.2 Trade-Off Analysis: Reconfi gurable Parallelism 237
6.10 Conclusions 242
6.11 Problem Set 243
7 Application Studies 246
7.1 Introduction 246
7.2 SOC Design Approach 246
7.3 Application Study: AES 251
7.3.1 AES: Algorithm and Requirements 251
7.3.2 AES: Design and Evaluation 253
7.4 Application Study: 3-D Graphics Processors 254
7.4.1 Analysis: Processing 255
7.4.2 Analysis: Interconnection 259
7.4.3 Prototyping 260
7.5 Application Study: Image Compression 262
7.5.1 JPEG Compression 262
7.5.2 Example JPEG System for Digital Still Camera 264
7.6 Application Study: Video Compression 266
7.6.1 MPEG and H.26X Video Compression: Requirements 268
7.6.2 H.264 Acceleration: Designs 271
7.7 Further Application Studies 276
7.7.1 MP3 Audio Decoding 276
7.7.2 Software-Defi ned Radio with 802.16 279
7.8 Conclusions 281
7.9 Problem Set 282
8 What's Next: Challenges Ahead 285
8.1 Introduction 285
8.2 Overview 286
8.3 Technology 288
8.4 Powering the ASOC 289
8.5 The Shape of the ASOC 292
8.6 Computer Module and Memory 293
8.7 RF or Light Communications 293
8.7.1 Lasers 294
8.7.2 RF 295
8.7.3 Potential for Laser/RF Communications 295
8.7.4 Networked ASOC 296
8.8 Sensing 296
8.8.1 Visual 296
8.8.2 Audio 297
8.9 Motion, Flight, and the Fruit Fly 298
8.10 Motivation 299
8.11 Overview 300
8.12 Pre-Deployment 302
8.13 Post-Deployment 307
8.13.1 Situation-Specifi c Optimization 308
8.13.2 Autonomous Optimization Control 309
8.14 Roadmap and Challenges 310
8.15 Summary 312
Appendix: Tools for Processor Evaluation 313
References 316
Index 329
List of Abbreviations and Acronyms xvii
1 Introduction to the Systems Approach 1
1.1 System Architecture: An Overview 1
1.2 Components of the System: Processors, Memories, and Interconnects 2
1.3 Hardware and Software: Programmability Versus Performance 5
1.4 Processor Architectures 7
1.4.1 Processor: A Functional View 8
1.4.2 Processor: An Architectural View 9
1.5 Memory and Addressing 19
1.5.1 SOC Memory Examples 20
1.5.2 Addressing: The Architecture of Memory 21
1.5.3 Memory for SOC Operating System 22
1.6 System-Level Interconnection 24
1.6.1 Bus-Based Approach 24
1.6.2 Network-on-Chip Approach 25
1.7 An Approach for SOC Design 26
1.7.1 Requirements and Specifi cations 26
1.7.2 Design Iteration 27
1.8 System Architecture and Complexity 29
1.9 Product Economics and Implications for SOC 31
1.9.1 Factors Affecting Product Costs 31
1.9.2 Modeling Product Economics and Technology Complexity: The Lesson for
SOC 33
1.10 Dealing with Design Complexity 34
1.10.1 Buying IP 34
1.10.2 Reconfi guration 35
1.11 Conclusions 37
1.12 Problem Set 38
2 Chip Basics: Time, Area, Power, Reliability, and Confi gurability 39
2.1 Introduction 39
2.1.1 Design Trade-Offs 39
2.1.2 Requirements and Specifi cations 42
2.2 Cycle Time 43
2.2.1 Defi ning a Cycle 43
2.2.2 Optimum Pipeline 44
2.2.3 Performance 46
2.3 Die Area and Cost 47
2.3.1 Processor Area 47
2.3.2 Processor Subunits 50
2.4 Ideal and Practical Scaling 53
2.5 Power 57
2.6 Area-Time-Power Trade-Offs in Processor Design 60
2.6.1 Workstation Processor 60
2.6.2 Embedded Processor 61
2.7 Reliability 62
2.7.1 Dealing with Physical Faults 62
2.7.2 Error Detection and Correction 65
2.7.3 Dealing with Manufacturing Faults 68
2.7.4 Memory and Function Scrubbing 69
2.8 Confi gurability 69
2.8.1 Why Reconfi gurable Design? 69
2.8.2 Area Estimate of Reconfi gurable Devices 70
2.9 Conclusion 71
2.10 Problem Set 71
3 Processors 74
3.1 Introduction 74
3.2 Processor Selection for SOC 76
3.2.1 Overview 76
3.2.2 Example: Soft Processors 76
3.2.3 Examples: Processor Core Selection 79
3.3 Basic Concepts in Processor Architecture 81
3.3.1 Instruction Set 81
3.3.2 Some Instruction Set Conventions 82
3.3.3 Branches 82
3.3.4 Interrupts and Exceptions 84
3.4 Basic Concepts in Processor Microarchitecture 86
3.5 Basic Elements in Instruction Handling 88
3.5.1 The Instruction Decoder and Interlocks 88
3.5.2 Bypassing 90
3.5.3 Execution Unit 90
3.6 Buffers: Minimizing Pipeline Delays 91
3.6.1 Mean Request Rate Buffers 91
3.6.2 Buffers Designed for a Fixed or Maximum Request Rate 92
3.7 Branches: Reducing the Cost of Branches 93
3.7.1 Branch Target Capture: Branch Target Buffers (BTBs) 94
3.7.2 Branch Prediction 97
3.8 More Robust Processors: Vector, Very Long Instruction Word (VLIW), and
Superscalar 101
3.9 Vector Processors and Vector Instruction Extensions 101
3.9.1 Vector Functional Units 103
3.10 VLIW Processors 107
3.11 Superscalar Processors 108
3.11.1 Data Dependencies 109
3.11.2 Detecting Instruction Concurrency 110
3.11.3 A Simple Implementation 112
3.11.4 Preserving State with Out-of-Order Execution 116
3.12 Processor Evolution and Two Examples 118
3.12.1 Soft and Firm Processor Designs: The Processor as IP 118
3.12.2 High-Performance, Custom-Designed Processors 118
3.13 Conclusions 119
3.14 Problem Set 120
4 Memory Design: System-on-Chip and Board-Based Systems 123
4.1 Introduction 123
4.2 Overview 125
4.2.1 SOC External Memory: Flash 125
4.2.2 SOC Internal Memory: Placement 126
4.2.3 The Size of Memory 127
4.3 Scratchpads and Cache Memory 128
4.4 Basic Notions 129
4.5 Cache Organization 130
4.6 Cache Data 133
4.7 Write Policies 134
4.8 Strategies for Line Replacement at Miss Time 135
4.8.1 Fetching a Line 136
4.8.2 Line Replacement 136
4.8.3 Cache Environment: Effects of System, Transactions, and
Multiprogramming 137
4.9 Other Types of Cache 138
4.10 Split I- and D-Caches and the Effect of Code Density 138
4.11 Multilevel Caches 139
4.11.1 Limits on Cache Array Size 139
4.11.2 Evaluating Multilevel Caches 140
4.11.3 Logical Inclusion 143
4.12 Virtual-to-Real Translation 143
4.13 SOC (On-Die) Memory Systems 145
4.14 Board-based (Off-Die) Memory Systems 147
4.15 Simple DRAM and the Memory Array 149
4.15.1 SDRAM and DDR SDRAM 152
4.15.2 Memory Buffers 156
4.16 Models of Simple Processor-Memory Interaction 156
4.16.1 Models of Multiple Simple Processors and Memory 157
4.16.2 The Strecker-Ravi Model 158
4.16.3 Interleaved Caches 160
4.17 Conclusions 161
4.18 Problem Set 161
5 Interconnect 165
5.1 Introduction 165
5.2 Overview: Interconnect Architectures 166
5.3 Bus: Basic Architecture 168
5.3.1 Arbitration and Protocols 170
5.3.2 Bus Bridge 171
5.3.3 Physical Bus Structure 171
5.3.4 Bus Varieties 172
5.4 SOC Standard Buses 173
5.4.1 AMBA 174
5.4.2 CoreConnect 177
5.4.3 Bus Interface Units: Bus Sockets and Bus Wrappers 179
5.5 Analytic Bus Models 183
5.5.1 Contention and Shared Bus 183
5.5.2 Simple Bus Model: Without Resubmission 184
5.5.3 Bus Model with Request Resubmission 185
5.5.4 Using the Bus Model: Computing the Offered Occupancy 185
5.5.5 Effect of Bus Transactions and Contention Time 186
5.6 Beyond the Bus: NOC with Switch Interconnects 187
5.6.1 Static Networks 190
5.6.2 Dynamic Networks 192
5.7 Some NOC Switch Examples 194
5.7.1 A 2-D Grid Example of Direct Networks 194
5.7.2 Asynchronous Crossbar Interconnect for Synchronous SOC (Dynamic
Network) 196
5.7.3 Blocking versus Nonblocking 197
5.8 Layered Architecture and Network Interface Unit 197
5.8.1 NOC Layered Architecture 198
5.8.2 NOC and NIU Example 200
5.8.3 Bus versus NOC 201
5.9 Evaluating Interconnect Networks 201
5.9.1 Static versus Dynamic Networks 202
5.9.2 Comparing Networks: Example 204
5.10 Conclusions 205
5.11 Problem Set 206
6 Customization and Confi gurability 208
6.1 Introduction 208
6.2 Estimating Effectiveness of Customization 209
6.3 SOC Customization: An Overview 210
6.4 Customizing Instruction Processors 212
6.4.1 Processor Customization Approaches 214
6.4.2 Architecture Description 215
6.4.3 Identifying Custom Instructions Automatically 217
6.5 Reconfi gurable Technologies 218
6.5.1 Reconfi gurable Functional Units (FUs) 218
6.5.2 Reconfi gurable Interconnects 222
6.5.3 Software Confi gurable Processors 224
6.6 Mapping Designs Onto Reconfi gurable Devices 226
6.7 Instance-Specifi c Design 228
6.8 Customizable Soft Processor: An Example 231
6.9 Reconfi guration 235
6.9.1 Reconfi guration Overhead Analysis 235
6.9.2 Trade-Off Analysis: Reconfi gurable Parallelism 237
6.10 Conclusions 242
6.11 Problem Set 243
7 Application Studies 246
7.1 Introduction 246
7.2 SOC Design Approach 246
7.3 Application Study: AES 251
7.3.1 AES: Algorithm and Requirements 251
7.3.2 AES: Design and Evaluation 253
7.4 Application Study: 3-D Graphics Processors 254
7.4.1 Analysis: Processing 255
7.4.2 Analysis: Interconnection 259
7.4.3 Prototyping 260
7.5 Application Study: Image Compression 262
7.5.1 JPEG Compression 262
7.5.2 Example JPEG System for Digital Still Camera 264
7.6 Application Study: Video Compression 266
7.6.1 MPEG and H.26X Video Compression: Requirements 268
7.6.2 H.264 Acceleration: Designs 271
7.7 Further Application Studies 276
7.7.1 MP3 Audio Decoding 276
7.7.2 Software-Defi ned Radio with 802.16 279
7.8 Conclusions 281
7.9 Problem Set 282
8 What's Next: Challenges Ahead 285
8.1 Introduction 285
8.2 Overview 286
8.3 Technology 288
8.4 Powering the ASOC 289
8.5 The Shape of the ASOC 292
8.6 Computer Module and Memory 293
8.7 RF or Light Communications 293
8.7.1 Lasers 294
8.7.2 RF 295
8.7.3 Potential for Laser/RF Communications 295
8.7.4 Networked ASOC 296
8.8 Sensing 296
8.8.1 Visual 296
8.8.2 Audio 297
8.9 Motion, Flight, and the Fruit Fly 298
8.10 Motivation 299
8.11 Overview 300
8.12 Pre-Deployment 302
8.13 Post-Deployment 307
8.13.1 Situation-Specifi c Optimization 308
8.13.2 Autonomous Optimization Control 309
8.14 Roadmap and Challenges 310
8.15 Summary 312
Appendix: Tools for Processor Evaluation 313
References 316
Index 329