High Performance Computing
Herausgegeben von Jeannot, Emmanuel; Zilinskas, Julius
High Performance Computing
Herausgegeben von Jeannot, Emmanuel; Zilinskas, Julius
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and…mehr
Andere Kunden interessierten sich auch für
- Jean-Marc PiersonLarge-Scale Distributed Systems and Energy Efficiency129,99 €
- Samee U. KhanScalable Computing and Communications186,99 €
- Fayez GebaliAlgorithms and Parallel Computing142,99 €
- Amol B. BakshiArchitecture-Independent Programming for Wireless Sensor Networks136,99 €
- John ChengProfessional Cuda C Programming52,99 €
- Anthony J. G. HeyGrid Computing186,99 €
- Dan C. MarinescuInternet Workflow Management186,99 €
-
-
-
With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing.
- Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC
- Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems
- Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
- Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC
- Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems
- Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Wiley Series on Parallel and Distributed Computing
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 512
- Erscheinungstermin: 3. Juni 2014
- Englisch
- Abmessung: 235mm x 157mm x 32mm
- Gewicht: 842g
- ISBN-13: 9781118712054
- ISBN-10: 1118712056
- Artikelnr.: 39757412
- Wiley Series on Parallel and Distributed Computing
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 512
- Erscheinungstermin: 3. Juni 2014
- Englisch
- Abmessung: 235mm x 157mm x 32mm
- Gewicht: 842g
- ISBN-13: 9781118712054
- ISBN-10: 1118712056
- Artikelnr.: 39757412
EMMANUEL JEANNOT is a Senior Research Scientist at INRIA. He received his PhD in computer science from École Normale Supérieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms, and models for parallel machines. JULIUS ILINSKAS is a Principal Researcher and a Head of Department at Vilnius University, Lithuania. His research interests include parallel computing, optimization, data analysis, and visualization.
Contributors xxiii
Preface xxvii
PART I INTRODUCTION 1
1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
Emmanuel Jeannot and Julius Zilinskas
1.1 Introduction and Vision 4
1.2 Scientific Organization 6
1.3 Activities of the Project 6
1.4 Main Outcomes of the Action 7
1.5 Contents of the Book 8
PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11
2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
Dimitar Lukarski and Maya Neytcheva
2.1 Introduction 14
2.2 General Description of Iterative Methods and Preconditioning 16
2.3 Preconditioning Techniques 20
2.4 Defect-Correction Technique 21
2.5 Multigrid Method 22
2.6 Parallelization of Iterative Methods 22
2.7 Heterogeneous Systems 23
2.8 Maintenance and Portability 29
2.9 Conclusion 30
3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
Matjaz Depolli, Gregor Kosec, and Roman Trobec
3.1 Introduction 34
3.2 Test Case 35
3.3 Parallel Implementation 39
3.4 Results 41
3.5 Discussion 45
3.6 Conclusion 47
4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
Natalija Tumanova and Raimondas Ciegis
4.1 Introduction 51
4.2 Formulation of the Discrete Model 53
4.3 Parallel Algorithms 59
4.4 Computational Results 63
4.5 Conclusions 69
PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73
5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier
5.1 Introduction 76
5.2 General Overview 76
5.3 Formalization of the Problem 79
5.4 Algorithmic Strategies for Topology Mapping 81
5.5 Mapping Enforcement Techniques 82
5.6 Survey of Solutions 85
5.7 Conclusion and Open Problems 89
6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky
6.1 Introduction 95
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97
6.3 Optimizations of Collectives on Homogeneous Clusters 98
6.4 Heterogeneous Networks 99
6.5 Topology- and Performance-Aware Collectives 100
6.6 Topology as Input 101
6.7 Performance as Input 102
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106
6.9 Conclusion 111
7. Effective Data Access Patterns on Massively Parallel Processors 115
Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini
7.1 Introduction 115
7.2 Architectural Details 116
7.3 K-Model 117
7.4 Parallel Prefix Sum 120
7.5 Bitonic Sorting Networks 126
7.6 Final Remarks 132
8. Scalable Storage I/O Software for Blue Gene Architectures 135
Florin Isaila, Javier Garcia, and Jesús Carretero
8.1 Introduction 135
8.2 Blue Gene System Overview 136
8.3 Design and Implementation 138
8.4 Conclusions and Future Work 142
PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145
9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter
9.1 Introduction 148
9.2 Concurrent Workflow Scheduling 153
9.3 Experimental Results and Discussion 160
9.4 Conclusions 165
10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski
10.1 Introduction 169
10.2 Related Works 171
10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172
10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173
10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178
10.6 Methods of Increasing the Algorithm Performance on GPUs 181
10.7 GPU Performance Evaluation 185
10.8 Conclusions and Future Works 190
11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro
11.1 Introduction 194
11.2 A Low-Cost Heterogeneous Computing Environment 196
11.3 First Case Study: The N-Body Problem 200
11.4 Second Case Study: The Convolution Algorithm 206
11.5 Conclusions 211
12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez
12.1 Introduction 215
12.2 Computation of Green's functions in Hybrid Systems 216
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222
12.4 Autotuning Parallel Codes 226
12.5 Conclusions and Future Research 230
PART V CPU + GPU COPROCESSING 235
13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong
13.1 Introduction 238
13.2 Related Work 241
13.3 Data Partitioning Based on Functional Performance Model 243
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245
13.5 Performance Measurement on CPUs/GPUs System 247
13.6 Functional Performance Models of Multiple Cores and GPUs 248
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250
13.8 Efficient Building of Functional Performance Models 251
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253
13.10 Conclusion 257
14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
Aleksandar Ilic and Leonel Sousa
14.1 Introduction: Heterogeneous CPU + GPU Systems 262
14.2 Background and Related Work 265
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269
14.4 Experimental Results 275
14.5 Conclusions 279
15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano
15.1 Introduction 283
15.2 Algorithmic Overview 285
15.3 CUDA Overview 287
15.4 Heterogeneous Systems and Load Balancing 288
15.5 Parallel Solutions to The APSP 289
15.6 Experimental Setup 291
15.7 Experimental Results 293
15.8 Conclusions 297
PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301
16. Resource Management for HPC on the Cloud 303
Marc E. Frincu and Dana Petcu
16.1 Introduction 303
16.2 On the Type of Applications for HPC and HPC2 305
16.3 HPC on the Cloud 306
16.4 Scheduling Algorithms for HPC2 311
16.5 Toward an Autonomous Scheduling Framework 312
16.6 Conclusions 319
17. Resource Discovery in Large-Scale Grid Systems 323
Konstantinos Karaoglanoglou and Helen Karatza
17.1 Introduction and Background 323
17.2 The Semantic Communities Approach 325
17.3 The P2P Approach 329
17.4 The Grid-Routing Transferring Approach 333
17.5 Conclusions 337
PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341
18. Energy-Aware Approaches for HPC Systems 343
Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson
18.1 Introduction 344
18.2 Power Consumption of Servers 345
18.3 Classification and Energy Profiles of HPC Applications 354
18.4 Policies and Leverages 359
18.5 Conclusion 360
19. Strategies for Increased Energy Awareness in Cloud Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth
19.1 Introduction 365
19.2 Related Work 367
19.3 Scenarios 369
19.4 Energy-Aware Cloud Federations 374
19.5 Conclusions 379
20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun
20.1 Introduction 384
20.2 Related Work 386
20.3 Overview of Our Approach 387
20.4 Heterogeneous CMP Design for Network Security Processors 390
20.5 Experimental Evaluation 394
20.6 Concluding Remarks 397
PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401
21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza
21.1 Introduction 404
21.2 CBIR For Hyperspectral Imaging Data 407
21.3 Jungle Computing 410
21.4 IBIS and Constellation 412
21.5 System Design and Implementation 415
21.6 Evaluation 420
21.7 Conclusions 426
22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun
22.1 Introduction 430
22.2 Related Work 431
22.3 Parallel Image Processing on GPU 433
22.4 Image Processing on Heterogeneous Architectures 437
22.5 Video Processing on GPU 438
22.6 Experimental Results 444
22.7 Conclusion 447
23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez
23.1 Introduction 452
23.2 Tomographic Reconstruction 453
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455
23.4 Hybrid CPU + GPU Tomographic Reconstruction 457
23.5 Results 459
23.6 Discussion and Conclusion 461
Acknowledgments 463
References 463
Index 467
Preface xxvii
PART I INTRODUCTION 1
1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
Emmanuel Jeannot and Julius Zilinskas
1.1 Introduction and Vision 4
1.2 Scientific Organization 6
1.3 Activities of the Project 6
1.4 Main Outcomes of the Action 7
1.5 Contents of the Book 8
PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11
2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
Dimitar Lukarski and Maya Neytcheva
2.1 Introduction 14
2.2 General Description of Iterative Methods and Preconditioning 16
2.3 Preconditioning Techniques 20
2.4 Defect-Correction Technique 21
2.5 Multigrid Method 22
2.6 Parallelization of Iterative Methods 22
2.7 Heterogeneous Systems 23
2.8 Maintenance and Portability 29
2.9 Conclusion 30
3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
Matjaz Depolli, Gregor Kosec, and Roman Trobec
3.1 Introduction 34
3.2 Test Case 35
3.3 Parallel Implementation 39
3.4 Results 41
3.5 Discussion 45
3.6 Conclusion 47
4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
Natalija Tumanova and Raimondas Ciegis
4.1 Introduction 51
4.2 Formulation of the Discrete Model 53
4.3 Parallel Algorithms 59
4.4 Computational Results 63
4.5 Conclusions 69
PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73
5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier
5.1 Introduction 76
5.2 General Overview 76
5.3 Formalization of the Problem 79
5.4 Algorithmic Strategies for Topology Mapping 81
5.5 Mapping Enforcement Techniques 82
5.6 Survey of Solutions 85
5.7 Conclusion and Open Problems 89
6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky
6.1 Introduction 95
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97
6.3 Optimizations of Collectives on Homogeneous Clusters 98
6.4 Heterogeneous Networks 99
6.5 Topology- and Performance-Aware Collectives 100
6.6 Topology as Input 101
6.7 Performance as Input 102
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106
6.9 Conclusion 111
7. Effective Data Access Patterns on Massively Parallel Processors 115
Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini
7.1 Introduction 115
7.2 Architectural Details 116
7.3 K-Model 117
7.4 Parallel Prefix Sum 120
7.5 Bitonic Sorting Networks 126
7.6 Final Remarks 132
8. Scalable Storage I/O Software for Blue Gene Architectures 135
Florin Isaila, Javier Garcia, and Jesús Carretero
8.1 Introduction 135
8.2 Blue Gene System Overview 136
8.3 Design and Implementation 138
8.4 Conclusions and Future Work 142
PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145
9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter
9.1 Introduction 148
9.2 Concurrent Workflow Scheduling 153
9.3 Experimental Results and Discussion 160
9.4 Conclusions 165
10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski
10.1 Introduction 169
10.2 Related Works 171
10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172
10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173
10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178
10.6 Methods of Increasing the Algorithm Performance on GPUs 181
10.7 GPU Performance Evaluation 185
10.8 Conclusions and Future Works 190
11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro
11.1 Introduction 194
11.2 A Low-Cost Heterogeneous Computing Environment 196
11.3 First Case Study: The N-Body Problem 200
11.4 Second Case Study: The Convolution Algorithm 206
11.5 Conclusions 211
12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez
12.1 Introduction 215
12.2 Computation of Green's functions in Hybrid Systems 216
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222
12.4 Autotuning Parallel Codes 226
12.5 Conclusions and Future Research 230
PART V CPU + GPU COPROCESSING 235
13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong
13.1 Introduction 238
13.2 Related Work 241
13.3 Data Partitioning Based on Functional Performance Model 243
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245
13.5 Performance Measurement on CPUs/GPUs System 247
13.6 Functional Performance Models of Multiple Cores and GPUs 248
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250
13.8 Efficient Building of Functional Performance Models 251
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253
13.10 Conclusion 257
14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
Aleksandar Ilic and Leonel Sousa
14.1 Introduction: Heterogeneous CPU + GPU Systems 262
14.2 Background and Related Work 265
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269
14.4 Experimental Results 275
14.5 Conclusions 279
15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano
15.1 Introduction 283
15.2 Algorithmic Overview 285
15.3 CUDA Overview 287
15.4 Heterogeneous Systems and Load Balancing 288
15.5 Parallel Solutions to The APSP 289
15.6 Experimental Setup 291
15.7 Experimental Results 293
15.8 Conclusions 297
PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301
16. Resource Management for HPC on the Cloud 303
Marc E. Frincu and Dana Petcu
16.1 Introduction 303
16.2 On the Type of Applications for HPC and HPC2 305
16.3 HPC on the Cloud 306
16.4 Scheduling Algorithms for HPC2 311
16.5 Toward an Autonomous Scheduling Framework 312
16.6 Conclusions 319
17. Resource Discovery in Large-Scale Grid Systems 323
Konstantinos Karaoglanoglou and Helen Karatza
17.1 Introduction and Background 323
17.2 The Semantic Communities Approach 325
17.3 The P2P Approach 329
17.4 The Grid-Routing Transferring Approach 333
17.5 Conclusions 337
PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341
18. Energy-Aware Approaches for HPC Systems 343
Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson
18.1 Introduction 344
18.2 Power Consumption of Servers 345
18.3 Classification and Energy Profiles of HPC Applications 354
18.4 Policies and Leverages 359
18.5 Conclusion 360
19. Strategies for Increased Energy Awareness in Cloud Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth
19.1 Introduction 365
19.2 Related Work 367
19.3 Scenarios 369
19.4 Energy-Aware Cloud Federations 374
19.5 Conclusions 379
20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun
20.1 Introduction 384
20.2 Related Work 386
20.3 Overview of Our Approach 387
20.4 Heterogeneous CMP Design for Network Security Processors 390
20.5 Experimental Evaluation 394
20.6 Concluding Remarks 397
PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401
21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza
21.1 Introduction 404
21.2 CBIR For Hyperspectral Imaging Data 407
21.3 Jungle Computing 410
21.4 IBIS and Constellation 412
21.5 System Design and Implementation 415
21.6 Evaluation 420
21.7 Conclusions 426
22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun
22.1 Introduction 430
22.2 Related Work 431
22.3 Parallel Image Processing on GPU 433
22.4 Image Processing on Heterogeneous Architectures 437
22.5 Video Processing on GPU 438
22.6 Experimental Results 444
22.7 Conclusion 447
23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez
23.1 Introduction 452
23.2 Tomographic Reconstruction 453
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455
23.4 Hybrid CPU + GPU Tomographic Reconstruction 457
23.5 Results 459
23.6 Discussion and Conclusion 461
Acknowledgments 463
References 463
Index 467
Contributors xxiii
Preface xxvii
PART I INTRODUCTION 1
1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
Emmanuel Jeannot and Julius Zilinskas
1.1 Introduction and Vision 4
1.2 Scientific Organization 6
1.3 Activities of the Project 6
1.4 Main Outcomes of the Action 7
1.5 Contents of the Book 8
PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11
2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
Dimitar Lukarski and Maya Neytcheva
2.1 Introduction 14
2.2 General Description of Iterative Methods and Preconditioning 16
2.3 Preconditioning Techniques 20
2.4 Defect-Correction Technique 21
2.5 Multigrid Method 22
2.6 Parallelization of Iterative Methods 22
2.7 Heterogeneous Systems 23
2.8 Maintenance and Portability 29
2.9 Conclusion 30
3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
Matjaz Depolli, Gregor Kosec, and Roman Trobec
3.1 Introduction 34
3.2 Test Case 35
3.3 Parallel Implementation 39
3.4 Results 41
3.5 Discussion 45
3.6 Conclusion 47
4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
Natalija Tumanova and Raimondas Ciegis
4.1 Introduction 51
4.2 Formulation of the Discrete Model 53
4.3 Parallel Algorithms 59
4.4 Computational Results 63
4.5 Conclusions 69
PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73
5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier
5.1 Introduction 76
5.2 General Overview 76
5.3 Formalization of the Problem 79
5.4 Algorithmic Strategies for Topology Mapping 81
5.5 Mapping Enforcement Techniques 82
5.6 Survey of Solutions 85
5.7 Conclusion and Open Problems 89
6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky
6.1 Introduction 95
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97
6.3 Optimizations of Collectives on Homogeneous Clusters 98
6.4 Heterogeneous Networks 99
6.5 Topology- and Performance-Aware Collectives 100
6.6 Topology as Input 101
6.7 Performance as Input 102
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106
6.9 Conclusion 111
7. Effective Data Access Patterns on Massively Parallel Processors 115
Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini
7.1 Introduction 115
7.2 Architectural Details 116
7.3 K-Model 117
7.4 Parallel Prefix Sum 120
7.5 Bitonic Sorting Networks 126
7.6 Final Remarks 132
8. Scalable Storage I/O Software for Blue Gene Architectures 135
Florin Isaila, Javier Garcia, and Jesús Carretero
8.1 Introduction 135
8.2 Blue Gene System Overview 136
8.3 Design and Implementation 138
8.4 Conclusions and Future Work 142
PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145
9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter
9.1 Introduction 148
9.2 Concurrent Workflow Scheduling 153
9.3 Experimental Results and Discussion 160
9.4 Conclusions 165
10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski
10.1 Introduction 169
10.2 Related Works 171
10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172
10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173
10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178
10.6 Methods of Increasing the Algorithm Performance on GPUs 181
10.7 GPU Performance Evaluation 185
10.8 Conclusions and Future Works 190
11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro
11.1 Introduction 194
11.2 A Low-Cost Heterogeneous Computing Environment 196
11.3 First Case Study: The N-Body Problem 200
11.4 Second Case Study: The Convolution Algorithm 206
11.5 Conclusions 211
12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez
12.1 Introduction 215
12.2 Computation of Green's functions in Hybrid Systems 216
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222
12.4 Autotuning Parallel Codes 226
12.5 Conclusions and Future Research 230
PART V CPU + GPU COPROCESSING 235
13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong
13.1 Introduction 238
13.2 Related Work 241
13.3 Data Partitioning Based on Functional Performance Model 243
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245
13.5 Performance Measurement on CPUs/GPUs System 247
13.6 Functional Performance Models of Multiple Cores and GPUs 248
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250
13.8 Efficient Building of Functional Performance Models 251
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253
13.10 Conclusion 257
14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
Aleksandar Ilic and Leonel Sousa
14.1 Introduction: Heterogeneous CPU + GPU Systems 262
14.2 Background and Related Work 265
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269
14.4 Experimental Results 275
14.5 Conclusions 279
15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano
15.1 Introduction 283
15.2 Algorithmic Overview 285
15.3 CUDA Overview 287
15.4 Heterogeneous Systems and Load Balancing 288
15.5 Parallel Solutions to The APSP 289
15.6 Experimental Setup 291
15.7 Experimental Results 293
15.8 Conclusions 297
PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301
16. Resource Management for HPC on the Cloud 303
Marc E. Frincu and Dana Petcu
16.1 Introduction 303
16.2 On the Type of Applications for HPC and HPC2 305
16.3 HPC on the Cloud 306
16.4 Scheduling Algorithms for HPC2 311
16.5 Toward an Autonomous Scheduling Framework 312
16.6 Conclusions 319
17. Resource Discovery in Large-Scale Grid Systems 323
Konstantinos Karaoglanoglou and Helen Karatza
17.1 Introduction and Background 323
17.2 The Semantic Communities Approach 325
17.3 The P2P Approach 329
17.4 The Grid-Routing Transferring Approach 333
17.5 Conclusions 337
PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341
18. Energy-Aware Approaches for HPC Systems 343
Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson
18.1 Introduction 344
18.2 Power Consumption of Servers 345
18.3 Classification and Energy Profiles of HPC Applications 354
18.4 Policies and Leverages 359
18.5 Conclusion 360
19. Strategies for Increased Energy Awareness in Cloud Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth
19.1 Introduction 365
19.2 Related Work 367
19.3 Scenarios 369
19.4 Energy-Aware Cloud Federations 374
19.5 Conclusions 379
20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun
20.1 Introduction 384
20.2 Related Work 386
20.3 Overview of Our Approach 387
20.4 Heterogeneous CMP Design for Network Security Processors 390
20.5 Experimental Evaluation 394
20.6 Concluding Remarks 397
PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401
21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza
21.1 Introduction 404
21.2 CBIR For Hyperspectral Imaging Data 407
21.3 Jungle Computing 410
21.4 IBIS and Constellation 412
21.5 System Design and Implementation 415
21.6 Evaluation 420
21.7 Conclusions 426
22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun
22.1 Introduction 430
22.2 Related Work 431
22.3 Parallel Image Processing on GPU 433
22.4 Image Processing on Heterogeneous Architectures 437
22.5 Video Processing on GPU 438
22.6 Experimental Results 444
22.7 Conclusion 447
23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez
23.1 Introduction 452
23.2 Tomographic Reconstruction 453
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455
23.4 Hybrid CPU + GPU Tomographic Reconstruction 457
23.5 Results 459
23.6 Discussion and Conclusion 461
Acknowledgments 463
References 463
Index 467
Preface xxvii
PART I INTRODUCTION 1
1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
Emmanuel Jeannot and Julius Zilinskas
1.1 Introduction and Vision 4
1.2 Scientific Organization 6
1.3 Activities of the Project 6
1.4 Main Outcomes of the Action 7
1.5 Contents of the Book 8
PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11
2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
Dimitar Lukarski and Maya Neytcheva
2.1 Introduction 14
2.2 General Description of Iterative Methods and Preconditioning 16
2.3 Preconditioning Techniques 20
2.4 Defect-Correction Technique 21
2.5 Multigrid Method 22
2.6 Parallelization of Iterative Methods 22
2.7 Heterogeneous Systems 23
2.8 Maintenance and Portability 29
2.9 Conclusion 30
3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
Matjaz Depolli, Gregor Kosec, and Roman Trobec
3.1 Introduction 34
3.2 Test Case 35
3.3 Parallel Implementation 39
3.4 Results 41
3.5 Discussion 45
3.6 Conclusion 47
4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
Natalija Tumanova and Raimondas Ciegis
4.1 Introduction 51
4.2 Formulation of the Discrete Model 53
4.3 Parallel Algorithms 59
4.4 Computational Results 63
4.5 Conclusions 69
PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73
5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier
5.1 Introduction 76
5.2 General Overview 76
5.3 Formalization of the Problem 79
5.4 Algorithmic Strategies for Topology Mapping 81
5.5 Mapping Enforcement Techniques 82
5.6 Survey of Solutions 85
5.7 Conclusion and Open Problems 89
6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky
6.1 Introduction 95
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97
6.3 Optimizations of Collectives on Homogeneous Clusters 98
6.4 Heterogeneous Networks 99
6.5 Topology- and Performance-Aware Collectives 100
6.6 Topology as Input 101
6.7 Performance as Input 102
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106
6.9 Conclusion 111
7. Effective Data Access Patterns on Massively Parallel Processors 115
Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini
7.1 Introduction 115
7.2 Architectural Details 116
7.3 K-Model 117
7.4 Parallel Prefix Sum 120
7.5 Bitonic Sorting Networks 126
7.6 Final Remarks 132
8. Scalable Storage I/O Software for Blue Gene Architectures 135
Florin Isaila, Javier Garcia, and Jesús Carretero
8.1 Introduction 135
8.2 Blue Gene System Overview 136
8.3 Design and Implementation 138
8.4 Conclusions and Future Work 142
PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145
9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter
9.1 Introduction 148
9.2 Concurrent Workflow Scheduling 153
9.3 Experimental Results and Discussion 160
9.4 Conclusions 165
10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski
10.1 Introduction 169
10.2 Related Works 171
10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172
10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173
10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178
10.6 Methods of Increasing the Algorithm Performance on GPUs 181
10.7 GPU Performance Evaluation 185
10.8 Conclusions and Future Works 190
11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro
11.1 Introduction 194
11.2 A Low-Cost Heterogeneous Computing Environment 196
11.3 First Case Study: The N-Body Problem 200
11.4 Second Case Study: The Convolution Algorithm 206
11.5 Conclusions 211
12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez
12.1 Introduction 215
12.2 Computation of Green's functions in Hybrid Systems 216
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222
12.4 Autotuning Parallel Codes 226
12.5 Conclusions and Future Research 230
PART V CPU + GPU COPROCESSING 235
13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong
13.1 Introduction 238
13.2 Related Work 241
13.3 Data Partitioning Based on Functional Performance Model 243
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245
13.5 Performance Measurement on CPUs/GPUs System 247
13.6 Functional Performance Models of Multiple Cores and GPUs 248
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250
13.8 Efficient Building of Functional Performance Models 251
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253
13.10 Conclusion 257
14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
Aleksandar Ilic and Leonel Sousa
14.1 Introduction: Heterogeneous CPU + GPU Systems 262
14.2 Background and Related Work 265
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269
14.4 Experimental Results 275
14.5 Conclusions 279
15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano
15.1 Introduction 283
15.2 Algorithmic Overview 285
15.3 CUDA Overview 287
15.4 Heterogeneous Systems and Load Balancing 288
15.5 Parallel Solutions to The APSP 289
15.6 Experimental Setup 291
15.7 Experimental Results 293
15.8 Conclusions 297
PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301
16. Resource Management for HPC on the Cloud 303
Marc E. Frincu and Dana Petcu
16.1 Introduction 303
16.2 On the Type of Applications for HPC and HPC2 305
16.3 HPC on the Cloud 306
16.4 Scheduling Algorithms for HPC2 311
16.5 Toward an Autonomous Scheduling Framework 312
16.6 Conclusions 319
17. Resource Discovery in Large-Scale Grid Systems 323
Konstantinos Karaoglanoglou and Helen Karatza
17.1 Introduction and Background 323
17.2 The Semantic Communities Approach 325
17.3 The P2P Approach 329
17.4 The Grid-Routing Transferring Approach 333
17.5 Conclusions 337
PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341
18. Energy-Aware Approaches for HPC Systems 343
Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson
18.1 Introduction 344
18.2 Power Consumption of Servers 345
18.3 Classification and Energy Profiles of HPC Applications 354
18.4 Policies and Leverages 359
18.5 Conclusion 360
19. Strategies for Increased Energy Awareness in Cloud Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth
19.1 Introduction 365
19.2 Related Work 367
19.3 Scenarios 369
19.4 Energy-Aware Cloud Federations 374
19.5 Conclusions 379
20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun
20.1 Introduction 384
20.2 Related Work 386
20.3 Overview of Our Approach 387
20.4 Heterogeneous CMP Design for Network Security Processors 390
20.5 Experimental Evaluation 394
20.6 Concluding Remarks 397
PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401
21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza
21.1 Introduction 404
21.2 CBIR For Hyperspectral Imaging Data 407
21.3 Jungle Computing 410
21.4 IBIS and Constellation 412
21.5 System Design and Implementation 415
21.6 Evaluation 420
21.7 Conclusions 426
22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun
22.1 Introduction 430
22.2 Related Work 431
22.3 Parallel Image Processing on GPU 433
22.4 Image Processing on Heterogeneous Architectures 437
22.5 Video Processing on GPU 438
22.6 Experimental Results 444
22.7 Conclusion 447
23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez
23.1 Introduction 452
23.2 Tomographic Reconstruction 453
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455
23.4 Hybrid CPU + GPU Tomographic Reconstruction 457
23.5 Results 459
23.6 Discussion and Conclusion 461
Acknowledgments 463
References 463
Index 467