Publication | Closed Access
Combining classifiers to identify online databases
80
Citations
32
References
2007
Year
Unknown Venue
EngineeringSemantic WebText MiningClassification MethodInformation RetrievalData ScienceData MiningData IntegrationData RetrievalOnline DatabasesData ManagementMultiple Classifier SystemOnline Databasedomain DSearch TechnologyVery Large DatabaseKnowledge DiscoveryModular Classifier CompositionIntelligent ClassificationComputer ScienceSearch Engine DesignWeb MiningHigh Accuracy
We address the problem of identifying the domain of onlinedatabases. More precisely, given a set F of Web forms automaticallygathered by a focused crawler and an online databasedomain D, our goal is to select from F only the formsthat are entry points to databases in D. Having a set ofWebforms that serve as entry points to similar online databasesis a requirement for many applications and techniques thataim to extract and integrate hidden-Web information, suchas meta-searchers, online database directories, hidden-Webcrawlers, and form-schema matching and merging.We propose a new strategy that automatically and accuratelyclassifies online databases based on features that canbe easily extracted from Web forms. By judiciously partitioningthe space of form features, this strategy allows theuse of simpler classifiers that can be constructed using learningtechniques that are better suited for the features of eachpartition. Experiments using real Web data in a representativeset of domains show that the use of different classifiersleads to high accuracy, precision and recall. This indicatesthat our modular classifier composition provides an effectiveand scalable solution for classifying online databases.
| Year | Citations | |
|---|---|---|
Page 1
Page 1