Publication | Closed Access
Selecting the right interestingness measure for association patterns
858
Citations
20
References
2002
Year
Unknown Venue
EngineeringMachine LearningPattern MiningAssociation Rule MiningText MiningOptimization-based Data MiningComputational Social ScienceRight Interestingness MeasureInformation RetrievalData ScienceData MiningPattern RecognitionTable StandardizationStatisticsPredictive AnalyticsKnowledge DiscoveryComputer ScienceFrequent Pattern MiningAssociation RuleRule Induction
Association rule mining relies on metrics such as support, confidence, lift, correlation, and collective strength, yet these measures often conflict and the optimal choice for a given domain is rarely known. This study reviews existing interestingness measures, identifies key properties for selection, and proposes an algorithm to help experts choose an appropriate measure. The authors compare 21 measures across defined properties and develop a table‑based algorithm that narrows the selection to a few representative measures. Results show that measures differ in suitability across domains, but most agree in support‑based pruning and table standardization scenarios.
Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.
| Year | Citations | |
|---|---|---|
Page 1
Page 1