Scalable algorithms for association mining

TLDR

Association rule discovery is a key problem in knowledge discovery and data mining, involving identification of frequent itemsets and conditional implication rules. The study proposes efficient algorithms to discover frequent itemsets, the compute‑intensive phase of association mining. The algorithms leverage structural properties of frequent itemsets by organizing items into a subset lattice decomposed into independent sublattices solvable in memory, and employ efficient lattice traversal techniques to rapidly identify all long frequent itemsets and their subsets, while also assessing the impact of various database layout schemes. Experimental results show the new algorithms outperform previous approaches by more than an order of magnitude on test databases.

Abstract

Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. We present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases.

References

Page 1

	Year	Citations

Page 1