Publication | Closed Access
PEL: Position-Enhanced Length Filter for Set Similarity Joins
20
Citations
11
References
2014
Year
Unknown Venue
Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity joins are typically imple-mented in a filter-verify framework: a filter generates candi-date pairs, possibly including false positives, which must be verified to produce the final join result. Good filters produce a small number of false positives, while they reduce the time they spend on hopeless candidates. The best known algo-rithms generate candidates using the so-called prefix filter in conjunction with length- and position-based filters. In this paper we show that the potential of length and po-sition have only partially been leveraged. We propose a new filter, the position-enhanced length filter, which exploits the matching position to incrementally tighten the length filter; our filter identifies hopeless candidates and avoids process-ing them. The filter is very efficient, requires no change in the data structures of most prefix filter algorithms, and is particularly effective for foreign joins, i.e., joins between two different collections of sets. 1.
| Year | Citations | |
|---|---|---|
Page 1
Page 1