Efficient and effective search in large-scale data
repositories requires complex indexing solutions
deployed on a large number of servers. Commercial Web
search engines already rely upon complex systems to
be able to return relevant query results and keep
processing times within the comfortable sub-second
limit. Nevertheless, the exponential growth of the
amount of content on the Web poses serious challenges
with respect to scalability. Coping with these
challenges requires novel indexing solutions that not
only remain scalable but also preserve the search
accuracy. In this work we introduce and explore the
concept of query-driven indexing - an index
construction strategy that uses caching techniques to
adapt to the querying patterns expressed by users. We
suggest to abandon the strict difference between
indexing and caching, and to build a distributed
indexing structure, or a distributed cache, such that
it is optimized for the current query load. Our
experimental and theoretical analysis shows that
employing query-driven indexing is especially
beneficial when the content is (geographically)
distributed in a Peer-to-Peer network.
repositories requires complex indexing solutions
deployed on a large number of servers. Commercial Web
search engines already rely upon complex systems to
be able to return relevant query results and keep
processing times within the comfortable sub-second
limit. Nevertheless, the exponential growth of the
amount of content on the Web poses serious challenges
with respect to scalability. Coping with these
challenges requires novel indexing solutions that not
only remain scalable but also preserve the search
accuracy. In this work we introduce and explore the
concept of query-driven indexing - an index
construction strategy that uses caching techniques to
adapt to the querying patterns expressed by users. We
suggest to abandon the strict difference between
indexing and caching, and to build a distributed
indexing structure, or a distributed cache, such that
it is optimized for the current query load. Our
experimental and theoretical analysis shows that
employing query-driven indexing is especially
beneficial when the content is (geographically)
distributed in a Peer-to-Peer network.