29,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

Recent progress in bioinformatics and especially high-throughput sequencing has enabled us to sequence and analyze genomes of many individuals, which can lead to improved diagnostics and treatment for patients suffering from genetic diseases. To achieve this, tools used in both clinical and research environments need to be enhanced to handle large amounts of data. This book analyzes application-level parallelization of database query processing by means of sharding as a technique for improving performance and scalability of an open-source search engine for genomic variants. We describe the…mehr

Produktbeschreibung
Recent progress in bioinformatics and especially high-throughput sequencing has enabled us to sequence and analyze genomes of many individuals, which can lead to improved diagnostics and treatment for patients suffering from genetic diseases. To achieve this, tools used in both clinical and research environments need to be enhanced to handle large amounts of data. This book analyzes application-level parallelization of database query processing by means of sharding as a technique for improving performance and scalability of an open-source search engine for genomic variants. We describe the challenges of designing and implementing a data access layer, the core of which is a general sharding framework. The approach allows for utilization of multiple processors as well as machines when querying the underlying data. This enables the system to scale in a near-linear fashion as more servers are added, with many queries achieving even superlinear speedup. This book should be useful to software engineers and scientists interested in an intriguing problem in the area of parallelization as well as anyone curious about what happens under the hood of modern genome analysis systems.
Autorenporträt
Miroslav Cupak earned his master's degree in Informatics at Masaryk University, with his work being focused on parallel and distributed systems. He puts his software engineering experience into practice at DNAstack, where he architects a platform for large-scale analysis of genomic data.