Subodh Kumar (Delhi Indian Institute of Technology)
Introduction to Parallel Programming
Subodh Kumar (Delhi Indian Institute of Technology)
Introduction to Parallel Programming
- Broschiertes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
In modern computer science, most programming is parallel programming. This textbook will be invaluable as a first course in parallel programming. It covers different parallel programming styles, describes parallel architecture, parallel programming techniques, presents algorithmic techniques and discusses parallel design and performance issues.
Andere Kunden interessierten sich auch für
- Peter Pacheco (USA University of San Francisco)An Introduction to Parallel Programming83,99 €
- Richard Ansorge (University of Cambridge)Programming in Parallel with CUDA66,99 €
- Michel Dubois (University of Southern California)Parallel Computer Organization and Design100,99 €
- Wen-mei W. Hwu (MulticoreWare CTO and professor specializing in coProgramming Massively Parallel Processors91,99 €
- Johnny Wei-Bing LinAn Introduction to Python Programming for Scientists and Engineers64,99 €
- M. J. LighthillAn Introduction to Fourier Analysis and Generalised Functions43,99 €
- Hiro AinanaDeath March to the Parallel World Rhapsody, Vol. 5 (Light Novel)14,99 €
-
-
-
In modern computer science, most programming is parallel programming. This textbook will be invaluable as a first course in parallel programming. It covers different parallel programming styles, describes parallel architecture, parallel programming techniques, presents algorithmic techniques and discusses parallel design and performance issues.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Cambridge University Press
- Seitenzahl: 350
- Erscheinungstermin: 5. Januar 2023
- Englisch
- Abmessung: 186mm x 241mm x 20mm
- Gewicht: 500g
- ISBN-13: 9781009069533
- ISBN-10: 1009069535
- Artikelnr.: 63264664
- Verlag: Cambridge University Press
- Seitenzahl: 350
- Erscheinungstermin: 5. Januar 2023
- Englisch
- Abmessung: 186mm x 241mm x 20mm
- Gewicht: 500g
- ISBN-13: 9781009069533
- ISBN-10: 1009069535
- Artikelnr.: 63264664
Dr Subodh Kumar is Professor at the Department of Computer Science and Engineering at the Indian Institute of Technology, Delhi; an institution he has been associated with since 2007. During this time, he has headed the High Performance Computing group of the institute, and taught several courses on computer graphics, data structures and algorithms, design practices in computer science and parallel programming. Previously, he held the post of Assistant Professor at the Johns Hopkins University. His research interests include rendering algorithms, virtual reality, geometry processing, human machine interface, visualization, large scale parallel computation and HPC.
List of Figures
Introduction
Concurrency and Parallelism
Why Study Parallel Programming
What is in this Book
1. An Introduction to Parallel Computer Architecture
1.1 Parallel Organization
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MIMD: Multiple Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
1.2 System Architecture
1.3 CPU Architecture
1.4 Memory and Cache
1.5 GPU Architecture
1.6 Interconnect Architecture
Routing
Links
Types and Quality of Networks
Torus Network
Hypercube Network
Cross-Bar Network
Shuffle-Exchange Network
Clos Network
Tree Network
Network Comparison
1.7 Summary
2. Parallel Programming Models
2.1 Distributed-Memory Programming Model
2.2 Shared-Memory Programming Model
2.3 Task Graph Model
2.4 Variants of Task Parallelism
2.5 Summary
3. Parallel Performance Analysis
3.1 Simple Parallel Model
3.2 Bulk-Synchronous Parallel Model
BSP Computation Time
BSP Example
3.3 PRAM Model
PRAM Computation Time
PRAM Example
3.4 Parallel Performance Evaluation
Latency and Throughput
Speed-up
Cost
Efficiency
Scalability
Iso-efficiency
3.5 Parallel Work
Brent's Work-Time Scheduling Principle
3.6 Amdahl's Law
3.7 Gustafson's Law
3.8 Karp-Flatt Metric
3.9 Summary
4. Synchronization and Communication Primitives
4.1 Threads and Processes
4.2 Race Condition and Consistency of State
Sequential Consistency
Causal Consistency
FIFO and Processor Consistency
Weak Consistency
Linearizability
4.3 Synchronization
Synchronization Condition
Protocol Control
Progress
Synchronization Hazards
4.4 Mutual Exclusion
Lock
Peterson's Algorithm
Bakery Algorithm
Compare and Swap
Transactional Memory
Barrier and Consensus
4.5 Communication
Point-to-Point Communication
RPC
Collective Communication
4.6 Summary
5. Parallel Program Design
5.1 Design Steps
Granularity
Communication
Synchronization
Load Balance
5.2 Task Decomposition
Domain Decomposition
Functional Decomposition
Task Graph Metrics
5.3 Task Execution
Preliminary Task Mapping
Task Scheduling Framework
Centralized Push Scheduling Strategy
Distributed Push Scheduling
Pull Scheduling
5.4 Input/Output
5.5 Debugging and Profiling
5.6 Summary
6. Middleware: The Practice of Parallel Programming
6.1 OpenMP
Preliminaries
OpenMP Thread Creation
OpenMP Memory Model
OpenMP Reduction
OpenMP Synchronization
Sharing a Loop's Work
Other Work-Sharing Pragmas
SIMD Pragma
Tasks
6.2 MPI
MPI Send and Receive
Message-Passing Synchronization
MPI Data Types
MPI Collective Communication
MPI Barrier
MPI Reduction
One-Sided Communication
MPI File IO
MPI Groups and Communicators
MPI Dynamic Parallelism
MPI Process Topology
6.3 Chapel
Partitioned Global Address Space
Chapel Tasks
Chapel Variable Scope
6.4 Map-Reduce
Parallel Implementation
Hadoop
6.5 GPU Programming
OpenMP GPU Off-Load
Data and Function on Device
Thread Blocks in OpenMP
CUDA
CUDA Programming Model
CPU-GPU Memory Transfer
Concurrent Kernels
CUDA Synchronization
CUDA Shared Memory
CUDA Parallel Memory Access
False Sharing
6.6 Summary
7. Parallel Algorithms and Techniques
7.1 Divide and Conquer: Prefix-Sum
Parallel Prefix-Sum: Method 1
Parallel Prefix-Sum: Method 2
Parallel Prefix-Sum: Method 3
7.2 Divide and Conquer: Merge Two Sorted Lists
Parallel Merge: Method 1
Parallel Merge: Method 2
Parallel Merge: Method 3
Parallel Merge: Method 4
7.3 Accelerated Cascading: Find Minima
7.4 Recursive Doubling: List Ranking
7.5 Recursive Doubling: Euler Tour
7.6 Recursive Doubling: Connected Components
7.7 Pipelining: Merge-Sort
Basic Merge-Sort
Pipelined Merges
4-Cover Property Analysis
Merge Operation per Tick
7.8 Application of Prefix-Sum: Radix-Sort
7.9 Exploiting Parallelism: Quick-Sort
7.10 Fixing Processor Count: Sample-Sort
7.11 Exploiting Pa
Introduction
Concurrency and Parallelism
Why Study Parallel Programming
What is in this Book
1. An Introduction to Parallel Computer Architecture
1.1 Parallel Organization
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MIMD: Multiple Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
1.2 System Architecture
1.3 CPU Architecture
1.4 Memory and Cache
1.5 GPU Architecture
1.6 Interconnect Architecture
Routing
Links
Types and Quality of Networks
Torus Network
Hypercube Network
Cross-Bar Network
Shuffle-Exchange Network
Clos Network
Tree Network
Network Comparison
1.7 Summary
2. Parallel Programming Models
2.1 Distributed-Memory Programming Model
2.2 Shared-Memory Programming Model
2.3 Task Graph Model
2.4 Variants of Task Parallelism
2.5 Summary
3. Parallel Performance Analysis
3.1 Simple Parallel Model
3.2 Bulk-Synchronous Parallel Model
BSP Computation Time
BSP Example
3.3 PRAM Model
PRAM Computation Time
PRAM Example
3.4 Parallel Performance Evaluation
Latency and Throughput
Speed-up
Cost
Efficiency
Scalability
Iso-efficiency
3.5 Parallel Work
Brent's Work-Time Scheduling Principle
3.6 Amdahl's Law
3.7 Gustafson's Law
3.8 Karp-Flatt Metric
3.9 Summary
4. Synchronization and Communication Primitives
4.1 Threads and Processes
4.2 Race Condition and Consistency of State
Sequential Consistency
Causal Consistency
FIFO and Processor Consistency
Weak Consistency
Linearizability
4.3 Synchronization
Synchronization Condition
Protocol Control
Progress
Synchronization Hazards
4.4 Mutual Exclusion
Lock
Peterson's Algorithm
Bakery Algorithm
Compare and Swap
Transactional Memory
Barrier and Consensus
4.5 Communication
Point-to-Point Communication
RPC
Collective Communication
4.6 Summary
5. Parallel Program Design
5.1 Design Steps
Granularity
Communication
Synchronization
Load Balance
5.2 Task Decomposition
Domain Decomposition
Functional Decomposition
Task Graph Metrics
5.3 Task Execution
Preliminary Task Mapping
Task Scheduling Framework
Centralized Push Scheduling Strategy
Distributed Push Scheduling
Pull Scheduling
5.4 Input/Output
5.5 Debugging and Profiling
5.6 Summary
6. Middleware: The Practice of Parallel Programming
6.1 OpenMP
Preliminaries
OpenMP Thread Creation
OpenMP Memory Model
OpenMP Reduction
OpenMP Synchronization
Sharing a Loop's Work
Other Work-Sharing Pragmas
SIMD Pragma
Tasks
6.2 MPI
MPI Send and Receive
Message-Passing Synchronization
MPI Data Types
MPI Collective Communication
MPI Barrier
MPI Reduction
One-Sided Communication
MPI File IO
MPI Groups and Communicators
MPI Dynamic Parallelism
MPI Process Topology
6.3 Chapel
Partitioned Global Address Space
Chapel Tasks
Chapel Variable Scope
6.4 Map-Reduce
Parallel Implementation
Hadoop
6.5 GPU Programming
OpenMP GPU Off-Load
Data and Function on Device
Thread Blocks in OpenMP
CUDA
CUDA Programming Model
CPU-GPU Memory Transfer
Concurrent Kernels
CUDA Synchronization
CUDA Shared Memory
CUDA Parallel Memory Access
False Sharing
6.6 Summary
7. Parallel Algorithms and Techniques
7.1 Divide and Conquer: Prefix-Sum
Parallel Prefix-Sum: Method 1
Parallel Prefix-Sum: Method 2
Parallel Prefix-Sum: Method 3
7.2 Divide and Conquer: Merge Two Sorted Lists
Parallel Merge: Method 1
Parallel Merge: Method 2
Parallel Merge: Method 3
Parallel Merge: Method 4
7.3 Accelerated Cascading: Find Minima
7.4 Recursive Doubling: List Ranking
7.5 Recursive Doubling: Euler Tour
7.6 Recursive Doubling: Connected Components
7.7 Pipelining: Merge-Sort
Basic Merge-Sort
Pipelined Merges
4-Cover Property Analysis
Merge Operation per Tick
7.8 Application of Prefix-Sum: Radix-Sort
7.9 Exploiting Parallelism: Quick-Sort
7.10 Fixing Processor Count: Sample-Sort
7.11 Exploiting Pa
List of Figures
Introduction
Concurrency and Parallelism
Why Study Parallel Programming
What is in this Book
1. An Introduction to Parallel Computer Architecture
1.1 Parallel Organization
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MIMD: Multiple Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
1.2 System Architecture
1.3 CPU Architecture
1.4 Memory and Cache
1.5 GPU Architecture
1.6 Interconnect Architecture
Routing
Links
Types and Quality of Networks
Torus Network
Hypercube Network
Cross-Bar Network
Shuffle-Exchange Network
Clos Network
Tree Network
Network Comparison
1.7 Summary
2. Parallel Programming Models
2.1 Distributed-Memory Programming Model
2.2 Shared-Memory Programming Model
2.3 Task Graph Model
2.4 Variants of Task Parallelism
2.5 Summary
3. Parallel Performance Analysis
3.1 Simple Parallel Model
3.2 Bulk-Synchronous Parallel Model
BSP Computation Time
BSP Example
3.3 PRAM Model
PRAM Computation Time
PRAM Example
3.4 Parallel Performance Evaluation
Latency and Throughput
Speed-up
Cost
Efficiency
Scalability
Iso-efficiency
3.5 Parallel Work
Brent's Work-Time Scheduling Principle
3.6 Amdahl's Law
3.7 Gustafson's Law
3.8 Karp-Flatt Metric
3.9 Summary
4. Synchronization and Communication Primitives
4.1 Threads and Processes
4.2 Race Condition and Consistency of State
Sequential Consistency
Causal Consistency
FIFO and Processor Consistency
Weak Consistency
Linearizability
4.3 Synchronization
Synchronization Condition
Protocol Control
Progress
Synchronization Hazards
4.4 Mutual Exclusion
Lock
Peterson's Algorithm
Bakery Algorithm
Compare and Swap
Transactional Memory
Barrier and Consensus
4.5 Communication
Point-to-Point Communication
RPC
Collective Communication
4.6 Summary
5. Parallel Program Design
5.1 Design Steps
Granularity
Communication
Synchronization
Load Balance
5.2 Task Decomposition
Domain Decomposition
Functional Decomposition
Task Graph Metrics
5.3 Task Execution
Preliminary Task Mapping
Task Scheduling Framework
Centralized Push Scheduling Strategy
Distributed Push Scheduling
Pull Scheduling
5.4 Input/Output
5.5 Debugging and Profiling
5.6 Summary
6. Middleware: The Practice of Parallel Programming
6.1 OpenMP
Preliminaries
OpenMP Thread Creation
OpenMP Memory Model
OpenMP Reduction
OpenMP Synchronization
Sharing a Loop's Work
Other Work-Sharing Pragmas
SIMD Pragma
Tasks
6.2 MPI
MPI Send and Receive
Message-Passing Synchronization
MPI Data Types
MPI Collective Communication
MPI Barrier
MPI Reduction
One-Sided Communication
MPI File IO
MPI Groups and Communicators
MPI Dynamic Parallelism
MPI Process Topology
6.3 Chapel
Partitioned Global Address Space
Chapel Tasks
Chapel Variable Scope
6.4 Map-Reduce
Parallel Implementation
Hadoop
6.5 GPU Programming
OpenMP GPU Off-Load
Data and Function on Device
Thread Blocks in OpenMP
CUDA
CUDA Programming Model
CPU-GPU Memory Transfer
Concurrent Kernels
CUDA Synchronization
CUDA Shared Memory
CUDA Parallel Memory Access
False Sharing
6.6 Summary
7. Parallel Algorithms and Techniques
7.1 Divide and Conquer: Prefix-Sum
Parallel Prefix-Sum: Method 1
Parallel Prefix-Sum: Method 2
Parallel Prefix-Sum: Method 3
7.2 Divide and Conquer: Merge Two Sorted Lists
Parallel Merge: Method 1
Parallel Merge: Method 2
Parallel Merge: Method 3
Parallel Merge: Method 4
7.3 Accelerated Cascading: Find Minima
7.4 Recursive Doubling: List Ranking
7.5 Recursive Doubling: Euler Tour
7.6 Recursive Doubling: Connected Components
7.7 Pipelining: Merge-Sort
Basic Merge-Sort
Pipelined Merges
4-Cover Property Analysis
Merge Operation per Tick
7.8 Application of Prefix-Sum: Radix-Sort
7.9 Exploiting Parallelism: Quick-Sort
7.10 Fixing Processor Count: Sample-Sort
7.11 Exploiting Pa
Introduction
Concurrency and Parallelism
Why Study Parallel Programming
What is in this Book
1. An Introduction to Parallel Computer Architecture
1.1 Parallel Organization
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MIMD: Multiple Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
1.2 System Architecture
1.3 CPU Architecture
1.4 Memory and Cache
1.5 GPU Architecture
1.6 Interconnect Architecture
Routing
Links
Types and Quality of Networks
Torus Network
Hypercube Network
Cross-Bar Network
Shuffle-Exchange Network
Clos Network
Tree Network
Network Comparison
1.7 Summary
2. Parallel Programming Models
2.1 Distributed-Memory Programming Model
2.2 Shared-Memory Programming Model
2.3 Task Graph Model
2.4 Variants of Task Parallelism
2.5 Summary
3. Parallel Performance Analysis
3.1 Simple Parallel Model
3.2 Bulk-Synchronous Parallel Model
BSP Computation Time
BSP Example
3.3 PRAM Model
PRAM Computation Time
PRAM Example
3.4 Parallel Performance Evaluation
Latency and Throughput
Speed-up
Cost
Efficiency
Scalability
Iso-efficiency
3.5 Parallel Work
Brent's Work-Time Scheduling Principle
3.6 Amdahl's Law
3.7 Gustafson's Law
3.8 Karp-Flatt Metric
3.9 Summary
4. Synchronization and Communication Primitives
4.1 Threads and Processes
4.2 Race Condition and Consistency of State
Sequential Consistency
Causal Consistency
FIFO and Processor Consistency
Weak Consistency
Linearizability
4.3 Synchronization
Synchronization Condition
Protocol Control
Progress
Synchronization Hazards
4.4 Mutual Exclusion
Lock
Peterson's Algorithm
Bakery Algorithm
Compare and Swap
Transactional Memory
Barrier and Consensus
4.5 Communication
Point-to-Point Communication
RPC
Collective Communication
4.6 Summary
5. Parallel Program Design
5.1 Design Steps
Granularity
Communication
Synchronization
Load Balance
5.2 Task Decomposition
Domain Decomposition
Functional Decomposition
Task Graph Metrics
5.3 Task Execution
Preliminary Task Mapping
Task Scheduling Framework
Centralized Push Scheduling Strategy
Distributed Push Scheduling
Pull Scheduling
5.4 Input/Output
5.5 Debugging and Profiling
5.6 Summary
6. Middleware: The Practice of Parallel Programming
6.1 OpenMP
Preliminaries
OpenMP Thread Creation
OpenMP Memory Model
OpenMP Reduction
OpenMP Synchronization
Sharing a Loop's Work
Other Work-Sharing Pragmas
SIMD Pragma
Tasks
6.2 MPI
MPI Send and Receive
Message-Passing Synchronization
MPI Data Types
MPI Collective Communication
MPI Barrier
MPI Reduction
One-Sided Communication
MPI File IO
MPI Groups and Communicators
MPI Dynamic Parallelism
MPI Process Topology
6.3 Chapel
Partitioned Global Address Space
Chapel Tasks
Chapel Variable Scope
6.4 Map-Reduce
Parallel Implementation
Hadoop
6.5 GPU Programming
OpenMP GPU Off-Load
Data and Function on Device
Thread Blocks in OpenMP
CUDA
CUDA Programming Model
CPU-GPU Memory Transfer
Concurrent Kernels
CUDA Synchronization
CUDA Shared Memory
CUDA Parallel Memory Access
False Sharing
6.6 Summary
7. Parallel Algorithms and Techniques
7.1 Divide and Conquer: Prefix-Sum
Parallel Prefix-Sum: Method 1
Parallel Prefix-Sum: Method 2
Parallel Prefix-Sum: Method 3
7.2 Divide and Conquer: Merge Two Sorted Lists
Parallel Merge: Method 1
Parallel Merge: Method 2
Parallel Merge: Method 3
Parallel Merge: Method 4
7.3 Accelerated Cascading: Find Minima
7.4 Recursive Doubling: List Ranking
7.5 Recursive Doubling: Euler Tour
7.6 Recursive Doubling: Connected Components
7.7 Pipelining: Merge-Sort
Basic Merge-Sort
Pipelined Merges
4-Cover Property Analysis
Merge Operation per Tick
7.8 Application of Prefix-Sum: Radix-Sort
7.9 Exploiting Parallelism: Quick-Sort
7.10 Fixing Processor Count: Sample-Sort
7.11 Exploiting Pa