Publication | Closed Access
Syntax tree fingerprinting for source code similarity detection
86
Citations
21
References
2009
Year
Unknown Venue
Software MaintenanceEngineeringSoftware EngineeringSource Code AnalysisSoftware AnalysisFormal VerificationData ScienceData MiningComputational LinguisticsSoftware MiningAbstract Syntax TreeSource CodeProgram Dependency GraphKnowledge DiscoveryComputer ScienceCode RepresentationStatic Program AnalysisSoftware DesignContent Similarity DetectionProgram AnalysisSoftware TestingFormal MethodsSyntax Tree
Numerous approaches based on metrics, token sequence pattern-matching, abstract syntax tree (AST) or program dependency graph (PDG) analysis have already been proposed to highlight similarities in source code: in this paper we present a simple and scalable architecture based on AST fingerprinting. Thanks to a study of several hashing strategies reducing false-positive collisions, we propose a framework that efficiently indexes AST representations in a database, that quickly detects exact (w.r.t source code abstraction) clone clusters and that easily retrieves their corresponding ASTs. Our aim is to allow further processing of neighboring exact matches in order to identify the larger approximate matches, dealing with the common modification patterns seen in the intra-project copy-pastes and in the plagiarism cases.
| Year | Citations | |
|---|---|---|
Page 1
Page 1