Publication | Closed Access
Induction in noisy domains
112
Citations
9
References
1987
Year
Unknown Venue
Real‑world data typically contain noise and incomplete description languages, causing attribute‑class correlations to have exceptions, yet many induction systems assume noiseless, complete domains. This study investigates induction of classification rules from real‑world examples, discusses noise‑related challenges, and proposes a top‑down algorithm for such domains. The authors introduce a top‑down induction algorithm that tolerates misclassifications and evaluate it against other systems on three real‑world medical datasets. The algorithm was experimentally compared with other induction systems on the three medical datasets.
This paper examines the induction of classification rules from examples using real-world data. Real-world data is almost always characterized by two features, which are important for the design of an induction algorithm. Firstly, there is often noise present, for example, due to imperfect measuring equipment used to collect the data. Secondly the description language is often incomplete, such that examples with identical descriptions in the language will not always be members of the same class. Many induction systems make the ‘noiseless domain’ assumption that the examples do not contain errors and the description language is complete, and consequently constrain their search for rules to those for which no counterexamples exist in the data used for induction. However, in real-world domains correlations between attributes and classes in a data set are rarely without exceptions. To locate such correlations and induce rules describing them it is also necessary to consider rules which may not classify all the training examples correctly. This paper firstly discusses some of the problems presented by noise and proposes a top-down induction algorithm for induction in real-world domains. Secondly, an experimental comparison of this algorithm with other induction systems is presented using three sets of real-world medical data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1