38,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

Search engines have become so indispensable that they rank second only to e-mail as the most popular online activity. To respond to queries in a timely fashion, search engines make use of large indices of word occurrences on Web pages to cross-reference websites to keywords. Such indices are maintained by spiders, a special kind of computer program that browses the Web autonomously. However, due to a variety of technological limitations, a single spider has proven insufficient to maintain a search engine's index. Hence, in this book, we review several alternatives to split a spider's work into…mehr

Produktbeschreibung
Search engines have become so indispensable that
they rank second only to e-mail as the most popular
online activity. To respond to queries in a timely
fashion, search engines make use of large indices of
word occurrences on Web pages to cross-reference
websites to keywords. Such indices are maintained by
spiders, a special kind of computer program that
browses the Web autonomously. However, due to a
variety of technological limitations, a single
spider has proven insufficient to maintain a search
engine's index. Hence, in this book, we review
several alternatives to split a spider's work into
multiple processes, and define a methodology to
preserve an up-to-date index of the Web.
SharpSpider, our prototype spider, has been
evaluated using the resources of PlanetLab, a
globally distributed platform for developing and
deploying planetary-scale services. Despite the
utilisation of very modest equipment, we have
performed large crawls of the Web, distributing the
workload amongst various computers spread across
different continents. The statistics derived from
our research offer valuable insight into the nature
of educational Web resources.
Autorenporträt
After concluding his PhD in Computer Science at the University
of Cambridge, Marco Palomino worked as a software consultant in
London, and then joined the Information Retrieval Group of the
University of Sunderland in 2007. Currently, Marco works as a
research associate, and his work focuses on the automatic
indexing of multimedia collections.