Publication | Closed Access
C4
35
Citations
22
References
2022
Year
Unknown Venue
Software MaintenanceSoftware DevelopmentEngineeringData ScienceProgram AnalysisCode GenerationAutomatic ProgrammingSoftware SystemsSoftware EngineeringSource Code AnalysisComputer ScienceCompilersCode RepresentationStatic Program AnalysisSoftware AnalysisCode ClonesProgramming Languages
During software development, developers introduce code clones by reusing existing code to improve programming productivity. Considering the detrimental effects on software maintenance and evolution, many techniques are proposed to detect code clones. Existing approaches are mainly used to detect clones written in the same programming language. However, it is common to develop programs with the same functionality but in different programming languages to support various platforms. In this paper, we propose a new approach named C4, referring to <u>C</u>ontrastive <u>C</u>ross-language <u>C</u>ode <u>C</u>lone detection model. It can detect cross-language clones with learned representations effectively. C4 exploits the pre-trained model CodeBERT to convert programs in different languages into high-dimensional vector representations. In addition, we fine tune the C4 model through a constrastive learning objective that can effectively recognize clone pairs and non-clone pairs. To evaluate the effectiveness of our approach, we conduct extensive experiments on the dataset proposed by CLCDSA. Experimental results show that C4 achieves scores of 0.94, 0.90, and 0.92 in terms of precision, recall and F-measure and substantially outperforms the state-of-the-art baselines.
| Year | Citations | |
|---|---|---|
Page 1
Page 1