38,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Commercial Web search engines already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the Web poses serious challenges with respect to scalability. Coping with these challenges requires novel indexing solutions that not only remain scalable but also preserve the search accuracy. In this work we introduce…mehr

Produktbeschreibung
Efficient and effective search in large-scale data
repositories requires complex indexing solutions
deployed on a large number of servers. Commercial Web
search engines already rely upon complex systems to
be able to return relevant query results and keep
processing times within the comfortable sub-second
limit. Nevertheless, the exponential growth of the
amount of content on the Web poses serious challenges
with respect to scalability. Coping with these
challenges requires novel indexing solutions that not
only remain scalable but also preserve the search
accuracy. In this work we introduce and explore the
concept of query-driven indexing - an index
construction strategy that uses caching techniques to
adapt to the querying patterns expressed by users. We
suggest to abandon the strict difference between
indexing and caching, and to build a distributed
indexing structure, or a distributed cache, such that
it is optimized for the current query load. Our
experimental and theoretical analysis shows that
employing query-driven indexing is especially
beneficial when the content is (geographically)
distributed in a Peer-to-Peer network.
Autorenporträt
Gleb Skobeltsyn is a post-doctoral researcher at Ecole
Polytechnique Fédérale de Lausanne (EPFL), Switzerland. He
received his PhD from EPFL in January 2009. His research is
focused on query-driven mechanisms for P2P Information Retrieval,
Web search engine architectures, caching techniques, large-scale
data management and social networks.