Publication | Open Access
The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies
411
Citations
23
References
2010
Year
Unknown Venue
Software MaintenanceEngineeringSoftware EngineeringSource Code AnalysisSoftware AnalysisCorpus LinguisticsProgram EvaluationText MiningNatural Language ProcessingApplied LinguisticsEmpirical Software Engineering ResearchSyntaxLanguage DocumentationData ScienceOpen-source Software DevelopmentComputational LinguisticsLanguage StudiesQualitas CorpusJava CodeSoftware MeasurementLinguisticsKnowledge DiscoveryLarge Curated CollectionComputer ScienceCode RepresentationCurated CollectionSoftware DesignProgram AnalysisSoftware TestingSoftware MetricLanguage CorpusLarge Empirical StudiesSystem SoftwareSoftware Language Engineering
In order to increase our ability to use measurement to support software development practise we need to do more analysis of code. However, empirical studies of code are expensive and their results are difficult to compare. We describe the Qualitas Corpus, a large curated collection of open source Java systems. The corpus reduces the cost of performing large empirical studies of code and supports comparison of measurements of the same artifacts. We discuss its design, organisation, and issues associated with its development.
| Year | Citations | |
|---|---|---|
Page 1
Page 1