How we refactor, and how we know it

TLDR

Existing knowledge of programmer refactoring is limited to a few projects, and studies rarely replicate findings or scrutinize underlying assumptions. The study aims to establish a sound scientific basis for refactoring research by analyzing four large datasets covering more than 13,000 developers, 240,000 tool‑assisted refactorings, 2,500 developer hours, and 3,400 commits. The authors analyze four datasets comprising more than 13,000 developers, 240,000 tool‑assisted refactorings, 2,500 developer hours, and 3,400 commits. The analysis challenges several prior assumptions—such as the expectation that refactoring is always logged—while confirming that refactoring is often interleaved with other changes, thereby supporting the generalizability of earlier findings.

Abstract

Much of what we know about how programmers refactor in the wild is based on studies that examine just a few software projects. Researchers have rarely taken the time to replicate these studies in other contexts or to examine the assumptions on which they are based. To help put refactoring research on a sound scientific basis, we draw conclusions using four data sets spanning more than 13 000 developers, 240 000 tool-assisted refactorings, 2500 developer hours, and 3400 version control commits. Using these data, we cast doubt on several previously stated assumptions about how programmers refactor, while validating others. For example, we find that programmers frequently do not indicate refactoring activity in commit logs, which contradicts assumptions made by several previous researchers. In contrast, we were able to confirm the assumption that programmers do frequently intersperse refactoring with other program changes. By confirming assumptions and replicating studies made by other researchers, we can have greater confidence that those researchers' conclusions are generalizable.

References

Page 1

	Year	Citations

Page 1