CPAR: Classification based on Predictive Association Rules

TLDR

Associative classification can achieve higher accuracy than traditional methods such as C4.5, but it generates many rules and risks overfitting, whereas rule‑based classifiers are faster but often less accurate. This paper introduces CPAR, a classification method that merges the strengths of associative and rule‑based approaches. CPAR employs a greedy rule‑generation algorithm that directly extracts rules from training data, tests more rules than conventional classifiers, and selects the top k rules based on expected accuracy to prevent overfitting.

Abstract

Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also suffers from two major deficiencies: (1) it generates a very large number of association rules, which leads to high processing overhead; and (2) its confidence-based rule evaluation measure may lead to overfitting.In comparison with associative classification, traditional rule-based classifiers, such as C4.5, FOIL and RIPPER, are substantially faster but their accuracy, in most cases, may not be as high. In this paper, we propose a new classification approach, CPAR (Classification based on Predictive Association Rules), which combines the advantages of both associative classification and traditional rule-based classification. Instead of generating a large number of candidate rules as in associative classification, CPAR adopts a greedy algorithm to generate rules directly from training data. Moreover, CPAR generates and tests more rules than traditional rule-based classifiers to avoid missing important rules. To avoid overfitting, CPAR uses expected accuracy to evaluate each rule and uses the best k rules in prediction.

References

Page 1

	Year	Citations

Page 1