Publication | Closed Access
A sentence-matching method for automatic license identification of source code files
97
Citations
16
References
2010
Year
Unknown Venue
Software MaintenanceEngineeringSoftware EngineeringSource Code AnalysisSource Code FilesSoftware AnalysisFormal VerificationCorpus LinguisticsText MiningInformation RetrievalData ScienceOpen-source Software DevelopmentComputational LinguisticsSoftware MiningSource CodeAutomatic License IdentificationFoss ComponentComputer ScienceStatic Program AnalysisSoftware DesignContent Similarity DetectionProgram AnalysisSoftware TestingSentence-matching MethodOpen-source SoftwareFormal MethodsSystem Software
The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is e for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS
| Year | Citations | |
|---|---|---|
Page 1
Page 1