Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

Proceedings of Empirical Software Engineering (EMSE)

M. MartinezT. DurieuxR. SommerardJ. XuanM. Monperrus 

PDFDOISlideExperiment Results


Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic test-suite based repair on Defects4J. The result of our experiment shows that the considered state-of-the-art repair methods can generate patches for 47 out of 224 bugs. However, those patches are only test-suite adequate, which means that they pass the test suite and may potentially be incorrect beyond the test-suite satisfaction correctness criterion. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly repaired with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial or incorrect patches still pass the test suite. With respect to practical applicability, it takes on average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All the repair systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair.

author = {Matias Martinez and Thomas Durieux and Romain Sommerard and Jifeng Xuan and Martin Monperrus},
journal = {Empirical Software Engineering (EMSE)},
publisher = {Springer},
title = {Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset},
year = {2016}
Last Updated: 25/05/2022