Publication | Closed Access
Augmented Sketch
193
Citations
34
References
2016
Year
Unknown Venue
EngineeringMachine LearningData ScienceData MiningHardware AccelerationHigh-performance ArchitectureComputer EngineeringComputer ArchitecturePipeline ParallelismStreaming AlgorithmParallel ProgrammingComputer ScienceData Streaming ArchitectureParallel ComputingSketch AlgorithmStream ProcessingAugmented Sketch
Approximated algorithms are often used to estimate the frequency of items on high volume, fast data streams. The most common ones are variations of Count-Min sketch, which use sub-linear space for the count, but can produce errors in the counts of the most frequent items and can misclassify low-frequency items. In this paper, we improve the accuracy of sketch-based algorithms by increasing the frequency estimation accuracy of the most frequent items and reducing the possible misclassification of low-frequency items, while also improving the overall throughput. Our solution, called Augmented Sketch (ASketch), is based on a pre-filtering stage that dynamically identifies and aggregates the most frequent items. Items overflowing the pre-filtering stage are processed using a conventional sketch algorithm, thereby making the solution general and applicable in a wide range of contexts. The pre-filtering stage can be efficiently implemented with SIMD instructions on multi-core machines and can be further parallelized through pipeline parallelism where the filtering stage runs in one core and the sketch algorithm runs in another core.
| Year | Citations | |
|---|---|---|
Page 1
Page 1