Publication | Closed Access
Integrating multiple internet directories by instance-based learning
64
Citations
13
References
2003
Year
Unknown Venue
Finding desired information on the Internet is be-coming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hi-erarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the col-lection and categorization of pages. We propose a method for integrating multiple Internet directo-ries by instance-based learning. Our method pro-vides the mapping of categories in order to trans-fer documents from one directory to another, in-stead of simply merging two directories into one. We present herein an effective algorithm for de-termining similar categories between two directo-ries via a statistical method called the κ-statistic. In order to evaluate the proposed method, we con-ducted experiments using two actual Internet direc-tories, Yahoo! and Google. The results show that the proposed method achieves extensive improve-ments relative to both the Naive Bayes and En-hanced Naive Bayes approaches, without any text analysis on documents. 1
| Year | Citations | |
|---|---|---|
Page 1
Page 1