Publication | Closed Access
Block-based web search
166
Citations
18
References
2004
Year
Unknown Venue
Search TechnologyEngineeringInformation RetrievalData ScienceData MiningBlock-based Web SearchSemantic PartitioningRetrieval PerformanceKnowledge DiscoveryIntelligent SearchingComputer ScienceSearch Engine IndexingQuery ExpansionSemantic WebQuery AnalysisSearch Engine DesignText MiningPage Segmentation Algorithms
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve retrieval performance in the web context. Because of the special characteristics of web pages, different page segmentation method will have different impact on web search performance. We compare four types of methods, including fixed-length page segmentation, DOM-based page segmentation, vision-based page segmentation, and a combined method which integrates both semantic and fixed-length properties. Experiments on block-level query expansion and retrieval are performed. Among the four approaches, the combined method achieves the best performance for web search. Our experimental results also show that such a semantic partitioning of web pages effectively deals with the problem of multiple drifting topics and mixed lengths, and thus has great potential to boost up the performance of current web search engines.
| Year | Citations | |
|---|---|---|
Page 1
Page 1