Fayez Gebali
Algorithms and Parallel Computing
Fayez Gebali
Algorithms and Parallel Computing
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
There is a software gap between the hardware potential and the performance that can be attained using today s software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.…mehr
Andere Kunden interessierten sich auch für
- High Performance Computing143,99 €
- George KesidisAn Introduction to Communication Network Analysis135,99 €
- Thomas C. JepsenDistributed Storage Networks136,99 €
- Michael WelzlNetwork Congestion Control149,99 €
- H. Jonathan ChaoHigh Performance Switches and Routers229,99 €
- Christos DouligerisNetwork Security155,99 €
- Amol B. BakshiArchitecture-Independent Programming for Wireless Sensor Networks141,99 €
-
-
-
There is a software gap between the hardware potential and the performance that can be attained using today s software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Wiley Series on Parallel and Distributed Computing
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 368
- Erscheinungstermin: 19. April 2011
- Englisch
- Abmessung: 240mm x 161mm x 24mm
- Gewicht: 688g
- ISBN-13: 9780470902103
- ISBN-10: 0470902108
- Artikelnr.: 32302358
- Wiley Series on Parallel and Distributed Computing
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 368
- Erscheinungstermin: 19. April 2011
- Englisch
- Abmessung: 240mm x 161mm x 24mm
- Gewicht: 688g
- ISBN-13: 9780470902103
- ISBN-10: 0470902108
- Artikelnr.: 32302358
Fayez Gebali, PhD, has taught at the University of Victoria since 1984 and has served as the Associate Dean of Engineering for undergraduate programs since 2002. He has contributed to dozens of journals and technical reports and has completed four books. Dr. Gebali's primary research interests include VLSI design, processor array design, algorithms for computer arithmetic, and communication system modeling.
Preface xiii List of Acronyms xix 1 Introduction 1 1.1 Introduction 1 1.2 Toward Automating Parallel Programming 2 1.3 Algorithms 4 1.4 Parallel Computing Design Considerations 12 1.5 Parallel Algorithms and Parallel Architectures 13 1.6 Relating Parallel Algorithm and Parallel Architecture 14 1.7 Implementation of Algorithms: A Two-Sided Problem 14 1.8 Measuring Benefi ts of Parallel Computing 15 1.9 Amdahl's Law for Multiprocessor Systems 19 1.10 Gustafson-Barsis's Law 21 1.11 Applications of Parallel Computing 22 2 Enhancing Uniprocessor Performance 29 2.1 Introduction 29 2.2 Increasing Processor Clock Frequency 30 2.3 Parallelizing ALU Structure 30 2.4 Using Memory Hierarchy 33 2.5 Pipelining 39 2.6 Very Long Instruction Word (VLIW) Processors 44 2.7 Instruction-Level Parallelism (ILP) and Superscalar Processors 45 2.8 Multithreaded Processor 49 3 Parallel Computers 53 3.1 Introduction 53 3.2 Parallel Computing 53 3.3 Shared-Memory Multiprocessors (Uniform Memory Access [UMA]) 54 3.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [NUMA]) 56 3.5 SIMD Processors 57 3.6 Systolic Processors 57 3.7 Cluster Computing 60 3.8 Grid (Cloud) Computing 60 3.9 Multicore Systems 61 3.10 SM 62 3.11 Communication Between Parallel Processors 64 3.12 Summary of Parallel Architectures 67 4 Shared-Memory Multiprocessors 69 4.1 Introduction 69 4.2 Cache Coherence and Memory Consistency 70 4.3 Synchronization and Mutual Exclusion 76 5 Interconnection Networks 83 5.1 Introduction 83 5.2 Classification of Interconnection Networks by Logical Topologies 84 5.3 Interconnection Network Switch Architecture 91 6 Concurrency Platforms 105 6.1 Introduction 105 6.2 Concurrency Platforms 105 6.3 Cilk++ 106 6.4 OpenMP 112 6.5 Compute Unifi ed Device Architecture (CUDA) 122 7 Ad Hoc Techniques for Parallel Algorithms 131 7.1 Introduction 131 7.2 Defining Algorithm Variables 133 7.3 Independent Loop Scheduling 133 7.4 Dependent Loops 134 7.5 Loop Spreading for Simple Dependent Loops 135 7.6 Loop Unrolling 135 7.7 Problem Partitioning 136 7.8 Divide-and-Conquer (Recursive Partitioning) Strategies 137 7.9 Pipelining 139 8 Nonserial-Parallel Algorithms 143 8.1 Introduction 143 8.2 Comparing DAG and DCG Algorithms 143 8.3 Parallelizing NSPA Algorithms Represented by a DAG 145 8.4 Formal Technique for Analyzing NSPAs 147 8.5 Detecting Cycles in the Algorithm 150 8.6 Extracting Serial and Parallel Algorithm Performance Parameters 151 8.7 Useful Theorems 153 8.8 Performance of Serial and Parallel Algorithms on Parallel Computers 156 9 z-Transform Analysis 159 9.1 Introduction 159 9.2 Definition of z-Transform 159 9.3 The 1-D FIR Digital Filter Algorithm 160 9.4 Software and Hardware Implementations of the z-Transform 161 9.5 Design 1: Using Horner's Rule for Broadcast Input and Pipelined Output 162 9.6 Design 2: Pipelined Input and Broadcast Output 163 9.7 Design 3: Pipelined Input and Output 164 10 Dependence Graph Analysis 167 10.1 Introduction 167 10.2 The 1-D FIR Digital Filter Algorithm 167 10.3 The Dependence Graph of an Algorithm 168 10.4 Deriving the Dependence Graph for an Algorithm 169 10.5 The Scheduling Function for the 1-D FIR Filter 171 10.6 Node Projection Operation 177 10.7 Nonlinear Projection Operation 179 10.8 Software and Hardware Implementations of the DAG Technique 180 11 Computational Geometry Analysis 185 11.1 Introduction 185 11.2 Matrix Multiplication Algorithm 185 11.3 The 3-D Dependence Graph and Computation Domain D 186 11.4 The Facets and Vertices of D 188 11.5 The Dependence Matrices of the Algorithm Variables 188 11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B 189 11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables 192 11.8 Data Scheduling 195 11.9 Projection Operation Using the Linear Projection Operator 200 11.10 Effect of Projection Operation on Data 205 11.11 The Resulting Multithreaded/Multiprocessor Architecture 206 11.12 Summary of Work Done in this Chapter 207 12 Case Study: One-Dimensional IIR Digital Filters 209 12.1 Introduction 209 12.2 The 1-D IIR Digital Filter Algorithm 209 12.3 The IIR Filter Dependence Graph 209 12.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm 216 13 Case Study: Two- and Three-Dimensional Digital Filters 219 13.1 Introduction 219 13.2 Line and Frame Wraparound Problems 219 13.3 2-D Recursive Filters 221 13.4 3-D Digital Filters 223 14 Case Study: Multirate Decimators and Interpolators 227 14.1 Introduction 227 14.2 Decimator Structures 227 14.3 Decimator Dependence Graph 228 14.4 Decimator Scheduling 230 14.5 Decimator DAG for s1 = [1 0] 231 14.6 Decimator DAG for s2 = [1
1] 233 14.7 Decimator DAG for s3 = [1 1] 235 14.8 Polyphase Decimator Implementations 235 14.9 Interpolator Structures 236 14.10 Interpolator Dependence Graph 237 14.11 Interpolator Scheduling 238 14.12 Interpolator DAG for s1 = [1 0] 239 14.13 Interpolator DAG for s2 = [1
1] 241 14.14 Interpolator DAG for s3 = [1 1] 243 14.15 Polyphase Interpolator Implementations 243 15 Case Study: Pattern Matching 245 15.1 Introduction 245 15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 245 15.3 Obtaining the Algorithm Dependence Graph 246 15.4 Data Scheduling 247 15.5 DAG Node Projection 248 15.6 DESIGN 1: Design Space Exploration When s ??n[1 1]t 249 15.7 DESIGN 2: Design Space Exploration When s ??n[1
1]t 252 15.8 DESIGN 3: Design Space Exploration When s = [1 0]t 253 16 Case Study: Motion Estimation for Video Compression 255 16.1 Introduction 255 16.2 FBMAs 256 16.3 Data Buffering Requirements 257 16.4 Formulation of the FBMA 258 16.5 Hierarchical Formulation of Motion Estimation 259 16.6 Hardware Design of the Hierarchy Blocks 261 17 Case Study: Multiplication over GF(2m) 267 17.1 Introduction 267 17.2 The Multiplication Algorithm in GF(2m) 268 17.3 Expressing Field Multiplication as an RIA 270 17.4 Field Multiplication Dependence Graph 270 17.5 Data Scheduling 271 17.6 DAG Node Projection 273 17.7 Design 1: Using d1 = [1 0]t 275 17.8 Design 2: Using d2 = [1 1]t 275 17.9 Design 3: Using d3 = [1
1]t 277 17.10 Applications of Finite Field Multipliers 277 18 Case Study: Polynomial Division over GF(2) 279 18.1 Introduction 279 18.2 The Polynomial Division Algorithm 279 18.3 The LFSR Dependence Graph 281 18.4 Data Scheduling 282 18.5 DAG Node Projection 283 18.6 Design 1: Design Space Exploration When s1 = [1
1] 284 18.7 Design 2: Design Space Exploration When s2 = [1 0] 286 18.8 Design 3: Design Space Exploration When s3 = [1
0.5] 289 18.9 Comparing the Three Designs 291 19 The Fast Fourier Transform 293 19.1 Introduction 293 19.2 Decimation-in-Time FFT 295 19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 298 19.4 Decimation-in-Frequency FFT 299 19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 303 20 Solving Systems of Linear Equations 305 20.1 Introduction 305 20.2 Special Matrix Structures 305 20.3 Forward Substitution (Direct Technique) 309 20.4 Back Substitution 312 20.5 Matrix Triangularization Algorithm 312 20.6 Successive over Relaxation (SOR) (Iterative Technique) 317 20.7 Problems 321 21 Solving Partial Differential Equations Using Finite Difference Method 323 21.1 Introduction 323 21.2 FDM for 1-D Systems 324 References 331 Index 337
1] 233 14.7 Decimator DAG for s3 = [1 1] 235 14.8 Polyphase Decimator Implementations 235 14.9 Interpolator Structures 236 14.10 Interpolator Dependence Graph 237 14.11 Interpolator Scheduling 238 14.12 Interpolator DAG for s1 = [1 0] 239 14.13 Interpolator DAG for s2 = [1
1] 241 14.14 Interpolator DAG for s3 = [1 1] 243 14.15 Polyphase Interpolator Implementations 243 15 Case Study: Pattern Matching 245 15.1 Introduction 245 15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 245 15.3 Obtaining the Algorithm Dependence Graph 246 15.4 Data Scheduling 247 15.5 DAG Node Projection 248 15.6 DESIGN 1: Design Space Exploration When s ??n[1 1]t 249 15.7 DESIGN 2: Design Space Exploration When s ??n[1
1]t 252 15.8 DESIGN 3: Design Space Exploration When s = [1 0]t 253 16 Case Study: Motion Estimation for Video Compression 255 16.1 Introduction 255 16.2 FBMAs 256 16.3 Data Buffering Requirements 257 16.4 Formulation of the FBMA 258 16.5 Hierarchical Formulation of Motion Estimation 259 16.6 Hardware Design of the Hierarchy Blocks 261 17 Case Study: Multiplication over GF(2m) 267 17.1 Introduction 267 17.2 The Multiplication Algorithm in GF(2m) 268 17.3 Expressing Field Multiplication as an RIA 270 17.4 Field Multiplication Dependence Graph 270 17.5 Data Scheduling 271 17.6 DAG Node Projection 273 17.7 Design 1: Using d1 = [1 0]t 275 17.8 Design 2: Using d2 = [1 1]t 275 17.9 Design 3: Using d3 = [1
1]t 277 17.10 Applications of Finite Field Multipliers 277 18 Case Study: Polynomial Division over GF(2) 279 18.1 Introduction 279 18.2 The Polynomial Division Algorithm 279 18.3 The LFSR Dependence Graph 281 18.4 Data Scheduling 282 18.5 DAG Node Projection 283 18.6 Design 1: Design Space Exploration When s1 = [1
1] 284 18.7 Design 2: Design Space Exploration When s2 = [1 0] 286 18.8 Design 3: Design Space Exploration When s3 = [1
0.5] 289 18.9 Comparing the Three Designs 291 19 The Fast Fourier Transform 293 19.1 Introduction 293 19.2 Decimation-in-Time FFT 295 19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 298 19.4 Decimation-in-Frequency FFT 299 19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 303 20 Solving Systems of Linear Equations 305 20.1 Introduction 305 20.2 Special Matrix Structures 305 20.3 Forward Substitution (Direct Technique) 309 20.4 Back Substitution 312 20.5 Matrix Triangularization Algorithm 312 20.6 Successive over Relaxation (SOR) (Iterative Technique) 317 20.7 Problems 321 21 Solving Partial Differential Equations Using Finite Difference Method 323 21.1 Introduction 323 21.2 FDM for 1-D Systems 324 References 331 Index 337
Preface xiii List of Acronyms xix 1 Introduction 1 1.1 Introduction 1 1.2 Toward Automating Parallel Programming 2 1.3 Algorithms 4 1.4 Parallel Computing Design Considerations 12 1.5 Parallel Algorithms and Parallel Architectures 13 1.6 Relating Parallel Algorithm and Parallel Architecture 14 1.7 Implementation of Algorithms: A Two-Sided Problem 14 1.8 Measuring Benefi ts of Parallel Computing 15 1.9 Amdahl's Law for Multiprocessor Systems 19 1.10 Gustafson-Barsis's Law 21 1.11 Applications of Parallel Computing 22 2 Enhancing Uniprocessor Performance 29 2.1 Introduction 29 2.2 Increasing Processor Clock Frequency 30 2.3 Parallelizing ALU Structure 30 2.4 Using Memory Hierarchy 33 2.5 Pipelining 39 2.6 Very Long Instruction Word (VLIW) Processors 44 2.7 Instruction-Level Parallelism (ILP) and Superscalar Processors 45 2.8 Multithreaded Processor 49 3 Parallel Computers 53 3.1 Introduction 53 3.2 Parallel Computing 53 3.3 Shared-Memory Multiprocessors (Uniform Memory Access [UMA]) 54 3.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [NUMA]) 56 3.5 SIMD Processors 57 3.6 Systolic Processors 57 3.7 Cluster Computing 60 3.8 Grid (Cloud) Computing 60 3.9 Multicore Systems 61 3.10 SM 62 3.11 Communication Between Parallel Processors 64 3.12 Summary of Parallel Architectures 67 4 Shared-Memory Multiprocessors 69 4.1 Introduction 69 4.2 Cache Coherence and Memory Consistency 70 4.3 Synchronization and Mutual Exclusion 76 5 Interconnection Networks 83 5.1 Introduction 83 5.2 Classification of Interconnection Networks by Logical Topologies 84 5.3 Interconnection Network Switch Architecture 91 6 Concurrency Platforms 105 6.1 Introduction 105 6.2 Concurrency Platforms 105 6.3 Cilk++ 106 6.4 OpenMP 112 6.5 Compute Unifi ed Device Architecture (CUDA) 122 7 Ad Hoc Techniques for Parallel Algorithms 131 7.1 Introduction 131 7.2 Defining Algorithm Variables 133 7.3 Independent Loop Scheduling 133 7.4 Dependent Loops 134 7.5 Loop Spreading for Simple Dependent Loops 135 7.6 Loop Unrolling 135 7.7 Problem Partitioning 136 7.8 Divide-and-Conquer (Recursive Partitioning) Strategies 137 7.9 Pipelining 139 8 Nonserial-Parallel Algorithms 143 8.1 Introduction 143 8.2 Comparing DAG and DCG Algorithms 143 8.3 Parallelizing NSPA Algorithms Represented by a DAG 145 8.4 Formal Technique for Analyzing NSPAs 147 8.5 Detecting Cycles in the Algorithm 150 8.6 Extracting Serial and Parallel Algorithm Performance Parameters 151 8.7 Useful Theorems 153 8.8 Performance of Serial and Parallel Algorithms on Parallel Computers 156 9 z-Transform Analysis 159 9.1 Introduction 159 9.2 Definition of z-Transform 159 9.3 The 1-D FIR Digital Filter Algorithm 160 9.4 Software and Hardware Implementations of the z-Transform 161 9.5 Design 1: Using Horner's Rule for Broadcast Input and Pipelined Output 162 9.6 Design 2: Pipelined Input and Broadcast Output 163 9.7 Design 3: Pipelined Input and Output 164 10 Dependence Graph Analysis 167 10.1 Introduction 167 10.2 The 1-D FIR Digital Filter Algorithm 167 10.3 The Dependence Graph of an Algorithm 168 10.4 Deriving the Dependence Graph for an Algorithm 169 10.5 The Scheduling Function for the 1-D FIR Filter 171 10.6 Node Projection Operation 177 10.7 Nonlinear Projection Operation 179 10.8 Software and Hardware Implementations of the DAG Technique 180 11 Computational Geometry Analysis 185 11.1 Introduction 185 11.2 Matrix Multiplication Algorithm 185 11.3 The 3-D Dependence Graph and Computation Domain D 186 11.4 The Facets and Vertices of D 188 11.5 The Dependence Matrices of the Algorithm Variables 188 11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B 189 11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables 192 11.8 Data Scheduling 195 11.9 Projection Operation Using the Linear Projection Operator 200 11.10 Effect of Projection Operation on Data 205 11.11 The Resulting Multithreaded/Multiprocessor Architecture 206 11.12 Summary of Work Done in this Chapter 207 12 Case Study: One-Dimensional IIR Digital Filters 209 12.1 Introduction 209 12.2 The 1-D IIR Digital Filter Algorithm 209 12.3 The IIR Filter Dependence Graph 209 12.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm 216 13 Case Study: Two- and Three-Dimensional Digital Filters 219 13.1 Introduction 219 13.2 Line and Frame Wraparound Problems 219 13.3 2-D Recursive Filters 221 13.4 3-D Digital Filters 223 14 Case Study: Multirate Decimators and Interpolators 227 14.1 Introduction 227 14.2 Decimator Structures 227 14.3 Decimator Dependence Graph 228 14.4 Decimator Scheduling 230 14.5 Decimator DAG for s1 = [1 0] 231 14.6 Decimator DAG for s2 = [1
1] 233 14.7 Decimator DAG for s3 = [1 1] 235 14.8 Polyphase Decimator Implementations 235 14.9 Interpolator Structures 236 14.10 Interpolator Dependence Graph 237 14.11 Interpolator Scheduling 238 14.12 Interpolator DAG for s1 = [1 0] 239 14.13 Interpolator DAG for s2 = [1
1] 241 14.14 Interpolator DAG for s3 = [1 1] 243 14.15 Polyphase Interpolator Implementations 243 15 Case Study: Pattern Matching 245 15.1 Introduction 245 15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 245 15.3 Obtaining the Algorithm Dependence Graph 246 15.4 Data Scheduling 247 15.5 DAG Node Projection 248 15.6 DESIGN 1: Design Space Exploration When s ??n[1 1]t 249 15.7 DESIGN 2: Design Space Exploration When s ??n[1
1]t 252 15.8 DESIGN 3: Design Space Exploration When s = [1 0]t 253 16 Case Study: Motion Estimation for Video Compression 255 16.1 Introduction 255 16.2 FBMAs 256 16.3 Data Buffering Requirements 257 16.4 Formulation of the FBMA 258 16.5 Hierarchical Formulation of Motion Estimation 259 16.6 Hardware Design of the Hierarchy Blocks 261 17 Case Study: Multiplication over GF(2m) 267 17.1 Introduction 267 17.2 The Multiplication Algorithm in GF(2m) 268 17.3 Expressing Field Multiplication as an RIA 270 17.4 Field Multiplication Dependence Graph 270 17.5 Data Scheduling 271 17.6 DAG Node Projection 273 17.7 Design 1: Using d1 = [1 0]t 275 17.8 Design 2: Using d2 = [1 1]t 275 17.9 Design 3: Using d3 = [1
1]t 277 17.10 Applications of Finite Field Multipliers 277 18 Case Study: Polynomial Division over GF(2) 279 18.1 Introduction 279 18.2 The Polynomial Division Algorithm 279 18.3 The LFSR Dependence Graph 281 18.4 Data Scheduling 282 18.5 DAG Node Projection 283 18.6 Design 1: Design Space Exploration When s1 = [1
1] 284 18.7 Design 2: Design Space Exploration When s2 = [1 0] 286 18.8 Design 3: Design Space Exploration When s3 = [1
0.5] 289 18.9 Comparing the Three Designs 291 19 The Fast Fourier Transform 293 19.1 Introduction 293 19.2 Decimation-in-Time FFT 295 19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 298 19.4 Decimation-in-Frequency FFT 299 19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 303 20 Solving Systems of Linear Equations 305 20.1 Introduction 305 20.2 Special Matrix Structures 305 20.3 Forward Substitution (Direct Technique) 309 20.4 Back Substitution 312 20.5 Matrix Triangularization Algorithm 312 20.6 Successive over Relaxation (SOR) (Iterative Technique) 317 20.7 Problems 321 21 Solving Partial Differential Equations Using Finite Difference Method 323 21.1 Introduction 323 21.2 FDM for 1-D Systems 324 References 331 Index 337
1] 233 14.7 Decimator DAG for s3 = [1 1] 235 14.8 Polyphase Decimator Implementations 235 14.9 Interpolator Structures 236 14.10 Interpolator Dependence Graph 237 14.11 Interpolator Scheduling 238 14.12 Interpolator DAG for s1 = [1 0] 239 14.13 Interpolator DAG for s2 = [1
1] 241 14.14 Interpolator DAG for s3 = [1 1] 243 14.15 Polyphase Interpolator Implementations 243 15 Case Study: Pattern Matching 245 15.1 Introduction 245 15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 245 15.3 Obtaining the Algorithm Dependence Graph 246 15.4 Data Scheduling 247 15.5 DAG Node Projection 248 15.6 DESIGN 1: Design Space Exploration When s ??n[1 1]t 249 15.7 DESIGN 2: Design Space Exploration When s ??n[1
1]t 252 15.8 DESIGN 3: Design Space Exploration When s = [1 0]t 253 16 Case Study: Motion Estimation for Video Compression 255 16.1 Introduction 255 16.2 FBMAs 256 16.3 Data Buffering Requirements 257 16.4 Formulation of the FBMA 258 16.5 Hierarchical Formulation of Motion Estimation 259 16.6 Hardware Design of the Hierarchy Blocks 261 17 Case Study: Multiplication over GF(2m) 267 17.1 Introduction 267 17.2 The Multiplication Algorithm in GF(2m) 268 17.3 Expressing Field Multiplication as an RIA 270 17.4 Field Multiplication Dependence Graph 270 17.5 Data Scheduling 271 17.6 DAG Node Projection 273 17.7 Design 1: Using d1 = [1 0]t 275 17.8 Design 2: Using d2 = [1 1]t 275 17.9 Design 3: Using d3 = [1
1]t 277 17.10 Applications of Finite Field Multipliers 277 18 Case Study: Polynomial Division over GF(2) 279 18.1 Introduction 279 18.2 The Polynomial Division Algorithm 279 18.3 The LFSR Dependence Graph 281 18.4 Data Scheduling 282 18.5 DAG Node Projection 283 18.6 Design 1: Design Space Exploration When s1 = [1
1] 284 18.7 Design 2: Design Space Exploration When s2 = [1 0] 286 18.8 Design 3: Design Space Exploration When s3 = [1
0.5] 289 18.9 Comparing the Three Designs 291 19 The Fast Fourier Transform 293 19.1 Introduction 293 19.2 Decimation-in-Time FFT 295 19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 298 19.4 Decimation-in-Frequency FFT 299 19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 303 20 Solving Systems of Linear Equations 305 20.1 Introduction 305 20.2 Special Matrix Structures 305 20.3 Forward Substitution (Direct Technique) 309 20.4 Back Substitution 312 20.5 Matrix Triangularization Algorithm 312 20.6 Successive over Relaxation (SOR) (Iterative Technique) 317 20.7 Problems 321 21 Solving Partial Differential Equations Using Finite Difference Method 323 21.1 Introduction 323 21.2 FDM for 1-D Systems 324 References 331 Index 337