Publication | Closed Access
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
331
Citations
29
References
2004
Year
Unknown Venue
Software MaintenanceEngineeringCopy-paste Related BugsSoftware EngineeringSource Code AnalysisSoftware AnalysisFormal VerificationData ScienceData MiningLarge SoftwareOperating System CodeSoftware MiningCopy-pasted CodeKnowledge DiscoveryComputer EngineeringComputer ScienceDebuggerRelated BugsAutomated RepairStatic Program AnalysisSoftware DesignOperating SystemsContent Similarity DetectionProgram AnalysisSoftware TestingFormal MethodsSymbolic ExecutionSystem Software
Copy-pasted code is very common in large software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Recent studies show that copy-paste is prone to introducing bugs and a significant portion of operating system bugs concentrate in copy-pasted code. Unfortunately, it is challenging to efficiently identify copy-pasted code in large software. Existing copy-paste detection tools are either not scalable to large software, or cannot handle small modifications in copy-pasted code. Furthermore, few tools are available to detect copy-paste related bugs. In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating systems, and detects copy-paste related bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copy-pasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected 28 copy-paste related bugs in the latest version of Linux and 23 in FreeBSD. In addition, we analyze some interesting characteristics of copy-paste in Linux and FreeBSD, including the distribution of copy-pasted code across different length, granularity, modules, degrees of modification, and various software versions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1