An empirical analysis of flaky tests

TLDR

Regression testing assumes deterministic test outcomes, yet flaky tests with non‑deterministic results undermine its reliability. The authors conduct the first extensive study of flaky tests. They examined 201 commits that likely fixed flaky tests across 51 open‑source projects, classified root causes, identified contributing approaches, and described common developer fix strategies. The study’s insights and implications can guide future research on avoiding flaky tests.

Abstract

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to either always pass or always fail for the same code under test. Unfortunately, in practice, some tests often called flaky tests—have non-deterministic outcomes. Such tests undermine the regression testing as they make it difficult to rely on test results. We present the first extensive study of flaky tests. We study in detail a total of 201 commits that likely fix flaky tests in 51 open-source projects. We classify the most common root causes of flaky tests, identify approaches that could manifest flaky behavior, and describe common strategies that developers use to fix flaky tests. We believe that our insights and implications can help guide future research on the important topic of (avoiding) flaky tests.

References

Page 1

	Year	Citations

Page 1