Skip to Content

Group 1 - Bug prediction models




  • Emitzá Guzmán Ortega
  • Amir Molzam Sharifloo
  • Dávid Tengeri
  • Melinda Tóth
  • Zuoning Yin

Read and comment in the topic forum. (authorization required)

In past and recent years many researchers have developed bug prediction models, i.e., models aimed at identifying source code artifacts that will likely contain a fault. The aim of such models would be to recommend to developers which artifacts would require a better Verification and Validation, because they would likely exhibit faults in the near future.

Some of these models predict fault-proneness using a pool of metrics extracted from a software release [1][2][3][4], e.g., the Chidamber & Kemerer [5] metrics suite. Others models are, instead, based on information extracted from source code changes [6][7][8] or previous defects [9][10].
Researchers also proposed models that take into account the effort necessary to inspect an artifact during defect prediction (e.g. with code review) [11][12][13][14]. The intent of effort-aware defect prediction is not to predict if an artifact is bug-prone or not, but rather to output a set of artifacts for which the ratio of effort spent for number of defects found is maximized.

Despite the numerous research efforts in this field, the adoption of fault prediction models is, in practice, still fairly limited.

The goal of this working group is to investigate (i) the limitations and weaknesses of existing fault prediction models, (ii) how research in this field could favor the adoption of fault-prediction models by practitioners, and (iii) the main open challenges in building fault prediction models.

The study will be conducted by contacting experts of this field-possibly from both academia and industry-during the ESEC-FSE conference and interviewing them to collect information aimed at addressing our goal.

To this purpose, the working group should identify a set of questions to be asked, for example:

1) What do you think are the main threats to validity that affect existing fault prediction models? For example, threats could be due to the quality of the data set [15], or else could be due to the inter-project (in)applicability of certain models, if not to their lack of generalizability.

2) What kind of fault prediction models are being used among your industrial partners/in your company? For example, whether they use very simple models, off-the-shelf tools, or whether someone is actually applying models recently developed by the research community.

3) What do you think are the main barriers for the adoption of fault-prediction models among practitioners? For example, the models are not easy to use, their indications are not easy to interpret, or it is very difficult to tune and apply a model to my projects.

4) In which direction do you think the research community should put effort? For example, to improve models performances, or to improve their generalizability/inter-project applicability, or else to make models more usable and easy to be interpreted by practitioners.

5) What is perceived by researchers as more important for defect prediction: the underlying data (e.g. code metrics, changes, previous defects) or the model (e.g. Random forest, Naive Bayes, SVM)?

[1] V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Software Eng., 22(10):751-761, 1996.
[2] T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Software Eng., 31(10):897-910, 2005.
[3] R. Subramanyam and M. S. Krishnan. Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. Software Eng., 29(4):297–310, 2003.
[4] N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In Proceedings of ICSE 2006, pages 452–461. ACM, 2006.
[5] S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Trans Software Eng., 20(6):476-493, June 1994.
[6] S. Kim, E. J. W. Jr., and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Trans. Software Eng., 34(2):181-196, 2008.
[7] N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proceedings of ICSE 2005, pages 284–292. ACM, 2005.
[8] A. E. Hassan. Predicting faults using the complexity of code changes. In Proceedings of ICSE 2009, pages 78–88, 2009.
[9] S. Kim, T. Zimmermann, J. Whitehead, and A. Zeller. Predicting faults from cached history. In Proceedings of ICSE 2007, pages 489–498. IEEE CS, 2007.
[10] T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Software Eng., 31(4):340–355, 2005.
[11] T. Menzies, Zach Milton, Burak Turhan, Bojan Cukic, and Yue Jiang Ayse Bener. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17:375–407, 2010.
[12] E. Arisholm and L. C. Briand. Predicting fault-prone components in a java legacy system. In Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pages 8–17. ACM, 2006.
[13] T. Mende and R. Koschke. Revisiting the evaluation of defect prediction models. In PROMISE ’09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1–10, New York, NY, USA, 2009. ACM.
[14] A. G. Koru, K. El Emam, D. Zhang, H. Liu, and D. Mathew. Theory of relative defect proneness. Empirical Software Engineering, 13:473–498, October 2008.
[15] C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. T. Devanbu. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2009, Amsterdam, The Netherlands, August 24-28, 2009, pages 121-130. ACM, 2009.