A discriminative model approach for accurate duplicate bug report retrieval

Abstract

Bug repositories are usually maintained in software projects. Testers or users submit bug reports to identify various issues with systems. Sometimes two or more bug reports correspond to the same defect. To address the problem with duplicate bug reports, a person called a triager needs to manually label these bug reports as duplicates, and link them to their master reports for subsequent maintenance work. However, in practice there are considerable duplicate bug reports sent daily; requesting triagers to manually label these bugs could be highly time consuming. To address this issue, recently, several techniques have be proposed using various similarity based metrics to detect candidate duplicate bug reports for manual verification. Automating triaging has been proved challenging as two reports of the same bug could be written in various ways. There is still much room for improvement in terms of accuracy of duplicate detection process. In this paper, we leverage recent advances on using discriminative models for information retrieval to detect duplicate bug reports more accurately. We have validated our approach on three large software bug repositories from Firefox, Eclipse, and OpenOffice. We show that our technique could result in 17--31%, 22--26%, and 35--43% relative improvement over state-of-the-art techniques in OpenOffice, Firefox, and Eclipse datasets respectively using commonly available natural language information only.

References

Page 1

	Year	Citations
LIBSVM Chih-Chung Chang, Chih‐Jen Lin ACM Transactions on Intelligent Systems and Technology Data ClassificationSupport Vector MachineClassification MethodImage AnalysisMachine Vision	2011	41.1K
Libsvm : A library for support vector machines Vittorio Ferrari Medical Entomology and Zoology Support Vector MachineImage AnalysisMachine VisionMachine LearningData Mining	2008	10.1K
Who should fix this bug? John Anvik, Lyndon Hiew, Gail C. Murphy Software MaintenanceEngineeringMachine LearningSoftware EngineeringSource Code Analysis	2006	906
What makes a good bug report? Nicolas Bettenburg, Sascha Just, Adrian Schröter, Software MaintenanceSoftware Development PracticeEngineeringBug ReportsDiagnosis	2008	592
An approach to detecting duplicate bug reports using natural language and execution information Xiaoyin Wang, Lu Zhang, Tao Xie, Software MaintenanceEngineeringSoftware EngineeringSource Code AnalysisSemantic Web	2008	536
Detection of Duplicate Defect Reports Using Natural Language Processing Per Runeson, Magnus Alexandersson, Oskar Nyholm Proceedings/Proceedings - International Conference on Software Engineering Software MaintenanceEngineeringSoftware EngineeringSoftware AnalysisCorpus Linguistics	2007	515
Automatic bug triage using text categorization. Davor Čubranić, Gail C. Murphy	2004	401
Discriminative Frequent Pattern Analysis for Effective Classification Hong Cheng, Xifeng Yan, Jiawei Han, EngineeringMachine LearningPattern DiscoveryPattern MiningText Mining	2007	351
Automated severity assessment of software defect reports Tim Menzies, Andrian Marcus Software MaintenanceEngineeringDiagnosisFault ForecastingSoftware Engineering	2008	337
Modeling bug report quality Pieter Hooimeijer, Westley Weimer Software MaintenanceSoftware Development PracticeEngineeringBug ReportsSoftware Engineering	2007	316

Page 1