23,95 €
inkl. MwSt.
Sofort per Download lieferbar
  • Format: ePub

Data cleaning is the all-important first step to successful data science, data analysis, and machine learning. If you work with any kind of data, this book is your go-to resource, arming you with the insights and heuristics experienced data scientists had to learn the hard way.
In a light-hearted and engaging exploration of different tools, techniques, and datasets real and fictitious, Python veteran David Mertz teaches you the ins and outs of data preparation and the essential questions you should be asking of every piece of data you work with.
Using a mixture of Python, R, and common
…mehr

Produktbeschreibung
Data cleaning is the all-important first step to successful data science, data analysis, and machine learning. If you work with any kind of data, this book is your go-to resource, arming you with the insights and heuristics experienced data scientists had to learn the hard way.

In a light-hearted and engaging exploration of different tools, techniques, and datasets real and fictitious, Python veteran David Mertz teaches you the ins and outs of data preparation and the essential questions you should be asking of every piece of data you work with.

Using a mixture of Python, R, and common command-line tools, Cleaning Data for Effective Data Science follows the data cleaning pipeline from start to end, focusing on helping you understand the principles underlying each step of the process. You'll look at data ingestion of a vast range of tabular, hierarchical, and other data formats, impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features. The long-form exercises at the end of each chapter let you get hands-on with the skills you've acquired along the way, also providing a valuable resource for academic courses.

Autorenporträt
David Mertz is the founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. Previously, he created the data science training program for Anaconda Inc. With the advent of deep neural networks he has turned to training our robot overlords as well. He was honored to work for 8 years with D. E. Shaw Research, who have built the world's fastest, highly-specialized, supercomputer for performing molecular dynamics. David was a Director of the PSF for six years, and remains co-chair of its Trademarks Committee and of its Scientific Python Working Group. His columns, Charming Python and XML Matters, written in the 2000s, were the most widely read articles in the Python world. He has written previous books for Packt, O'Reilly and Addison-Wesley, and has given keynote addresses at numerous international programming conferences. Long ago, he earned a doctorate in post-structuralist political philosophy. Fate is a cruel mistress.