Publication | Open Access
An empirical evaluation of set similarity join techniques
125
Citations
20
References
2016
Year
EngineeringInformation RetrievalData ScienceData MiningSet SimilarityPrefix FilterSimilarity MeasureString-searching AlgorithmKnowledge DiscoveryCombinatorial Pattern MatchingOverall RuntimeComputational ComplexityComputer ScienceGraph MatchingSemantic SimilaritySimilarity SearchSimilarity Join Techniques
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant number of set elements and is faster than some of the more sophisticated filter techniques. Although we can identify three winners, we find that most algorithms show very similar performance. The key technique is the prefix filter, and AllPairs, the first algorithm adopting this techniques is still a relevant competitor. We repeat experiments from previous work and discuss diverging results. All our claims are supported by a detailed analysis of the factors that determine the overall runtime.
| Year | Citations | |
|---|---|---|
Page 1
Page 1