Real world performance of association rule algorithms

TLDR

The study compares five popular association rule algorithms across three real‑world datasets and an artificial dataset to evaluate their performance. Experiments confirm that the algorithms outperform each other on the artificial dataset but this advantage does not transfer to real data, and algorithm choice matters only at high‑support levels that generate more than a million rules—where Apriori finishes in under ten minutes—while outside a narrow support range rule counts explode super‑exponentially and the algorithm choice becomes irrelevant.

Abstract

This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant.

References

Page 1

	Year	Citations

Page 1