Publication | Open Access
Turbo-charging vertical mining of large databases
125
Citations
4
References
2000
Year
Unknown Venue
EngineeringPattern DiscoveryPattern MiningMining MethodsKnowledge Discovery In DatabasesInformation RetrievalData ScienceData MiningAssociation Rule LearningData IntegrationData ManagementHigh-performance Data AnalyticsKnowledge DiscoveryComputer ScienceHorizontal Mining AlgorithmRelational QueriesFrequent Pattern MiningVertical RepresentationAssociation RuleTurbo-charging Vertical MiningStructure MiningWorkload RegionsBig Data
In a vertical market‑basket representation, each item is linked to a column of transaction indicators, and recent vertical mining algorithms outperform classical horizontal ones but are limited by database size, content characteristics, or schema constraints. The authors introduce VIPER, a general‑purpose vertical mining algorithm that imposes no special database requirements. VIPER encodes transactions as compressed bit‑vectors called snakes and applies novel optimizations for snake generation, intersection, counting, and storage, with performance evaluated on synthetic workloads. Experiments show VIPER achieves substantial speedups over prior vertical and horizontal algorithms, even surpassing an optimal but impractical horizontal method in certain workload regimes.
In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1