Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction

TLDR

Software evolution analysis must identify specific changes across multiple program versions. The study introduces change distilling, a tree differencing algorithm for fine‑grained source code change extraction, and reports its evaluation. The algorithm improves Chawathe et al.’s method by matching AST nodes and generating a minimum edit script, classifying changes via a taxonomy, and is evaluated on a benchmark of 1,064 manually labeled changes from 219 revisions of three open‑source projects. The algorithm outperforms the original approach, achieving a 45 % closer approximation to the minimum edit script and reducing mean absolute percentage error from 79 % to 34 %.

Abstract

A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm of Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm detects changes by finding a match between nodes of the compared two abstract syntax trees and a minimum edit script. We can identify change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark we developed that consists of 1,064 manually classified changes in 219 revisions from three different open source projects. We achieved significant improvements in extracting types of source code changes: our algorithm approximates the minimum edit script by 45% better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, i.e., we reach a mean absolute percentage error of 34%, compared to 79% reached by the original algorithm. The paper describes both the change distilling and the results of our evaluation.

References

Page 1

	Year	Citations

Page 1