Publication | Open Access
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web
222
Citations
7
References
2000
Year
Unknown Venue
Ranking AlgorithmEngineeringCollaborative Information RetrievalIntelligent Information RetrievalSemantic WebInformation QualityText MiningInformation RetrievalData ScienceData MiningManagementRelevance FeedbackData IntegrationContent AnalysisSite SelectionSearch TechnologyCollaborative SearchSearch EffectivenessKnowledge DiscoveryInformation ManagementQuality MetricsSearch Engine DesignCentralized/distributed Information Retrieval
Most web information retrieval systems rely on similarity ranking based on term frequency, ignoring document quality and thus retrieving low‑quality documents. The study proposes integrating similarity ranking with quality ranking in both centralized and distributed search environments. Six quality metrics—currency, availability, information‑to‑noise ratio, authority, popularity, and cohesiveness—were evaluated for their impact on ranking. Incorporating currency, availability, information‑to‑noise ratio, and cohesiveness improved centralized search, while availability, information‑to‑noise ratio, popularity, and cohesiveness enhanced site selection, and adding popularity to fusion significantly boosted overall effectiveness.
Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms based solely on term frequency statistics. Information quality is usually ignored. This leads to the problem that documents are retrieved without regard to their quality. We present an approach that combines similarity-based similarity ranking with quality ranking in centralized and distributed search environments. Six quality metrics, including the currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, were investigated. Search effectiveness was significantly improved when the currency, availability, information-to-noise ratio and page cohesiveness metrics were incorporated in centralized search. The improvement seen when the availability, information-to- noise ratio, popularity, and cohesiveness metrics were incorporated in site selection was also significant. Finally, incorporating the popularity metric in information fusion resulted in a significant improvement. In summary, the results show that incorporating quality metrics can generally improve search effectiveness in both centralized and distributed search environments.
| Year | Citations | |
|---|---|---|
Page 1
Page 1