Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

Automated Data Collection with R (eBook, PDF)

A Practical Guide to Web Scraping and Text Mining

Fotogalerie

Als Download kaufen

60,99 €

inkl. MwSt.

Sofort per Download lieferbar

0 °P sammeln

Jetzt verschenken

60,99 €

inkl. MwSt.

Sofort per Download lieferbar

Alle Infos zum eBook verschenken

0 °P sammeln

Als Download kaufen

Geschenk

Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

Automated Data Collection with R (eBook, PDF)

A Practical Guide to Web Scraping and Text Mining

Format: PDF

Jetzt bewerten Jetzt bewerten

A hands on guide to web scraping and text mining for both beginners and experienced users of R * Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. * Provides basic techniques to query web documents and data sets (XPath and regular expressions). * An extensive set of exercises are presented to guide the reader through each technique. * Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. * Case studies are featured throughout along with examples for…mehr

Geräte: PC
mit Kopierschutz
eBook Hilfe
Größe: 8.25MB

Andere Kunden interessierten sich auch für

Simon Munzert
Automated Data Collection with R (eBook, ePUB)

60,99 €
Alan Anderson
Statistics for Big Data For Dummies (eBook, PDF)

15,99 €
Lawrence S. Meyers
Performing Data Analysis Using IBM SPSS (eBook, PDF)

84,99 €
Advances in Longitudinal Survey Methodology (eBook, PDF)

97,99 €
David R. Heise
Surveying Cultures (eBook, PDF)

97,99 €
Debra Wetcher-Hendricks
Analyzing Quantitative Data (eBook, PDF)

106,99 €
Joseph A. Cazier
Leading in Analytics (eBook, PDF)

25,99 €

Produktbeschreibung

Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in A, B, BG, CY, CZ, D, DK, EW, E, FIN, F, GR, HR, H, IRL, I, LT, L, LR, M, NL, PL, P, R, S, SLO, SK ausgeliefert werden.

Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.

Produktdetails

Produktdetails
Verlag: Wiley
Seitenzahl: 480
Erscheinungstermin: 24. Oktober 2014
Englisch
ISBN-13: 9781118834787
Artikelnr.: 41772962

Produktdetails

Verlag: Wiley
Seitenzahl: 480
Erscheinungstermin: 24. Oktober 2014
Englisch
ISBN-13: 9781118834787
Artikelnr.: 41772962

Herstellerkennzeichnung

Autorenporträt

Simon Munzert is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Christian Rubba is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Peter Meißner is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley. Dominic Nyhuis is the author of Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining, published by Wiley.

Inhaltsangabe

Preface xv
1 Introduction 1
1.1 Case study: World Heritage Sites in Danger 1
1.2 Some remarks on web data quality 7
1.3 Technologies for disseminating, extracting, and storing web data 9
1.4 Structure of the book 13
Part One A Primer on Web and Data Technologies 15
2 HTML 17
2.1 Browser presentation and source code 18
2.2 Syntax rules 19
2.3 Tags and attributes 24
2.4 Parsing 32
3 XML and JSON 41
3.1 A short example XML document 42
3.2 XML syntax rules 43
3.3 When is an XML document well formed or valid? 51
3.4 XML extensions and technologies 53
3.5 XML and R in practice 60
3.6 A short example JSON document 68
3.7 JSON syntax rules 69
3.8 JSON and R in practice 71
4 XPath 79
4.1 XPath--a query language for web documents 80
4.2 Identifying node sets with XPath 81
4.3 Extracting node elements 93
5 HTTP 101
5.1 HTTP fundamentals 102
5.2 Advanced features of HTTP 116
5.3 Protocols beyond HTTP 124
5.4 HTTP in action 126
6 AJAX 149
6.1 JavaScript 150
6.2 XHR 154
6.3 Exploring AJAX with Web Developer Tools 158
7 SQL and relational databases 164
7.1 Overview and terminology 165
7.2 Relational Databases 167
7.3 SQL: a language to communicate with Databases 175
7.4 Databases in action 188
8 Regular expressions and essential string functions 196
8.1 Regular expressions 198
8.2 String processing 207
8.3 A word on character encodings 214
Part Two A Practical Toolbox forWeb Scraping and Text Mining 219
9 Scraping the Web 221
9.1 Retrieval scenarios 222
9.2 Extraction strategies 270
9.3 Web scraping: Good practice 278
9.4 Valuable sources of inspiration 290
10 Statistical text processing 295
10.1 The running example: Classifying press releases of the British
government 296
10.2 Processing textual data 298
10.3 Supervised learning techniques 307
10.4 Unsupervised learning techniques 313
11 Managing data projects 322
11.1 Interacting with the file system 322
11.2 Processing multiple documents/links 323
11.3 Organizing scraping procedures 328
11.4 Executing R scripts on a regular basis 334
Part Three A Bag of Case Studies 341
12 Collaboration networks in the US Senate 343
12.1 Information on the bills 344
12.2 Information on the senators 350
12.3 Analyzing the network structure 353
12.4 Conclusion 358
13 Parsing information from semistructured documents 359
13.1 Downloading data from the FTP server 360
13.2 Parsing semistructured text data 361
13.3 Visualizing station and temperature data 368
14 Predicting the 2014 Academy Awards using Twitter 371
15 Mapping the geographic distribution of names 380
15.1 Developing a data collection strategy 381
15.2 Website inspection 382
15.3 Data retrieval and information extraction 384
15.4 Mapping names 387
15.5 Automating the process 389
16 Gathering data on mobile phones 396
16.1 Page exploration 396
16.2 Scraping procedure 404
16.3 Graphical analysis 406
16.4 Data storage 408
17 Analyzing sentiments of product reviews 416
17.1 Introduction 416
17.2 Collecting the data 417
17.3 Analyzing the data 426
17.4 Conclusion 434
References 435
General index 442
Package index 448
Function index 449

Inhaltsangabe

Automated Data Collection with R (eBook, PDF)

A Practical Guide to Web Scraping and Text Mining

Automated Data Collection with R (eBook, PDF)

A Practical Guide to Web Scraping and Text Mining

1. Login

2. tolino select Abo