38,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

This thesis studies in a Web search engine how a crawler with limited computing resource can effectively crawl from the dynamically changing Web and acquire the most updated Web documents, and how a Web search engine can provide information-object--oriented indexing methods which enable users to retrieve desired information with high accuracy and high efficiency. To address the first problem, we design a set of sampling policies with various downloading granularity for the sampling method, taking into account the link structure, the directory structure, and the content-based features which…mehr

Produktbeschreibung
This thesis studies in a Web search engine how a crawler with limited computing resource can effectively crawl from the dynamically changing Web and acquire the most updated Web documents, and how a Web search engine can provide information-object--oriented indexing methods which enable users to retrieve desired information with high accuracy and high efficiency. To address the first problem, we design a set of sampling policies with various downloading granularity for the sampling method, taking into account the link structure, the directory structure, and the content-based features which include the clustering technique. We further extend the clustering-based sampling approach by testing more dynamic features and strategically selecting samples from each cluster. For the second problem, we propose building indexes on extracted metadata of various information objects, instead of the whole document. We set up a digital library named ArchSeer for the domain of archeology. ArchSeerallows users to retrieve archeology literature via domain-specific search engines.
Autorenporträt
Qingzhao Tan was born in Guangzhou, China. She joined the CSE dept. in Pennsylvania State University in 2004. Before that, she also received a Master's degree in Computer Science from Hong Kong University of Science and Technology in 2004. Her research focuses on information retrieval and Web search engine.