21,95 €
inkl. MwSt.
Sofort per Download lieferbar
payback
11 °P sammeln
  • Format: ePub

A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications. This book will guide readers (using best practices) in developing Data Lake's…mehr

Produktbeschreibung
A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications.
This book will guide readers (using best practices) in developing Data Lake's capabilities. It will focus on architect data governance, security, data quality, data lineage tracking, metadata management, and semantic data tagging. By the end of this book, you will have a good understanding of building a Data Lake for Big Data.

Autorenporträt
Pradeep Pasupuleti has over 17 years of experience in architecting and developing distributed and real-time data-driven systems. Currently, he focuses on developing robust data platforms and data products that are fuelled by scalable machine-learning algorithms, and delivering value to customers in addressing business problems by applying his deep technical insights.Pradeep founded Datatma expressly to humanize Big Data, simplify it, and unravel new value on a previously unimaginable scale in economy and scope. He has created COE (Centers of Excellence) to provide quick wins with data products that analyze high-dimensional multistructured data using scalable natural language processing and deep learning techniques. He has performed roles in technology consulting and advising Fortune 500 companies.Beulah Salome Purra has over 11 years of experience and specializes in building large-scale distributed systems. Her core expertise lies in working on Big Data Analytics. In her current role at ATMECS, her focus is on building robust and scalable data products that extract value from the organization's huge data assets.