Publication | Closed Access
Efficient Graph Similarity Joins with Edit Distance Constraints
62
Citations
31
References
2012
Year
Unknown Venue
EngineeringSemantic WebEdit Distance ConstraintsGraph MatchingSimilarity MatchesGraph ProcessingText MiningSimilarity ProblemNatural Language ProcessingInformation RetrievalData ScienceData MiningCombinatorial OptimizationQ-gram IdeaKnowledge DiscoveryComputer ScienceBioinformaticsGraph TheoryCombinatorial Pattern MatchingBusinessSemantic GraphSimilarity SearchSemantic Similarity
Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper, we study the graph similarity join problem that returns pairs of graphs such that their edit distances are no larger than a threshold. Inspired by the q-gram idea for string similarity problem, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. An efficient algorithm is proposed to exploit both matching and mismatching features to improve the filtering and verification on candidates. We demonstrate the proposed algorithm significantly outperforms existing approaches with extensive experiments on publicly available datasets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1