Publication | Closed Access
Detecting Duplicate Bug Report Using Character N-Gram-Based Features
142
Citations
15
References
2010
Year
Unknown Venue
Software MaintenanceEngineeringBug ReportsSoftware EngineeringSource Code AnalysisSoftware AnalysisCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDuplicate ReportsNamed-entity RecognitionDuplicate Bug ReportsMachine TranslationNlp TaskKnowledge DiscoveryTerminology ExtractionComputer ScienceAutomated RepairContent Similarity DetectionSoftware Testing
We present an approach to identify duplicate bug reports expressed in free-form text. Duplicate reports needs to be identified to avoid a situation where duplicate reports get assigned to multiple developers. Also, duplicate reports can contain complementary information which can be useful for bug fixing. Automatic identification of duplicate reports (from thousands of existing reports in a bug repository) can increase the productivity of a Triager by reducing the amount of time a Triager spends in searching for duplicate bug reports of any incoming report. The proposed method uses character N-gram-based model for the task of duplicate bug report detection. Previous approaches are word-based whereas this study investigates the usefulness of low-level features based on characters which have certain inherent advantages (such as natural-language independence, robustness towards noisy data and effective handling of domain specific term variations) over word-based features for the problem of duplicate bug report detection. The proposed solution is evaluated on a publicly-available dataset consisting of more than 200 thousand bug reports from the open-source Eclipse project. The dataset consists of ground-truth (pre-annotated dataset having bug reports tagged as duplicate by the Triager). Empirical results and evaluation metrics quantifying retrieval performance indicate that the approach is effective.
| Year | Citations | |
|---|---|---|
Page 1
Page 1