Rough Set Approaches to Rule Induction from Incomplete Data

Abstract

In this paper we assume that data are presented in the form of decision tables, incomplete when some attribute values are missing. Two main cases of missing attribute values are considered: lost (the original value was erased) and &amp;quot;do not care &amp;quot; conditions (the original value was irrelevant). This paper uses, as the main tool, attribute-value pair blocks. These blocks are used to construct characteristic sets, characteristic relations, and lower and upper approximations for decision tables with missing attribute values. For such tables three different definitions of lower and upper approximations may be applied: singleton, subset, and concept. A modified version of the LEM2 rule induction algorithm, accepting input data with both lost values and &amp;quot;do not care &amp;quot; conditions, is described. Results of experiments on some real-life incomplete data, in which all missing attribute values were considered to be either lost or &amp;quot;do not care &amp;quot; conditions are presented as well. A conclusion is that an error rate for classification is smaller when missing attribute values are considered to be lost.

References

Page 1

	Year	Citations

Page 1