Publication | Open Access
Empirically evaluating readily available information for regression test optimization in continuous integration
44
Citations
72
References
2021
Year
Unknown Venue
Software MaintenanceEngineeringVerificationSoftware EngineeringSoftware AnalysisData ScienceTest AutomationRegression Test OptimizationSystems EngineeringAvailable InformationIntegration TestingStatisticsTest TracesPredictive AnalyticsComputer ScienceSoftware DesignRegression TestingTest ManagementTest-driven DevelopmentProgram AnalysisSoftware TestingBuild DependenciesContinuous IntegrationTest EvolutionRegression Test Selection
Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1