Publication | Closed Access
Scalable SimRank join algorithm
25
Citations
41
References
2015
Year
Unknown Venue
Cluster ComputingRanking AlgorithmEngineeringThreshold θSimilarity MeasureNetwork AnalysisGraph MatchingLink PredictionInformation RetrievalData ScienceData MiningLink AnalysisCombinatorial OptimizationKnowledge DiscoveryComputer ScienceDistributed Query ProcessingQuery OptimizationRelational QueriesNetwork ScienceGraph TheorySimilarity JoinSimrank Similarity MeasureBusinessSimilarity Search
Similarity join finds all pairs of objects (i, j) with similarity score s(i, j) greater than some specified threshold θ. This is a fundamental query problem in the database research community, and is used in many practical applications, such as duplicate detection, merge/purge, record linkage, object matching, and reference conciliation. In this paper, we propose a scalable approximation algorithm with an arbitrary accuracy for the similarity join problem with the SimRank similarity measure. The algorithm consists of two phases: filter and verification. The filter phase enumerates similar pair candidates, and the similarity of each candidate is then assessed in the verification phase. The scalability of the proposed algorithm is experimentally verified for large real networks. The complexity depends only on the number of similar pairs, but does not depend on all pairs O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ). The proposed algorithm scales up to the network of 5M vertices and 70M edges. By comparing the state-of-the-art algorithms, it is about 10 times faster and it requires about 10 times smaller memory.
| Year | Citations | |
|---|---|---|
2003 | 6.4K | |
1973 | 5K | |
1969 | 2.4K | |
2002 | 1.9K | |
2002 | 1.7K | |
2003 | 1.2K | |
1995 | 801 | |
2002 | 639 | |
2005 | 522 | |
2003 | 431 |
Page 1
Page 1