Evaluating testing methods by delivered reliability [software]

TLDR

Software testing has two primary goals: debug testing to uncover and fix defects, and operational testing to assess reliability, with the former often neglecting random test data selection while the latter relies on it and provides accurate reliability estimates but may not aid defect removal. This study investigates whether achieving program reliability is better served by defect‑probing debug testing or by direct reliability assessment through operational testing, using probabilistic analysis and simple program models. The authors compare the two testing approaches within a probabilistic model where program failures are detected and eliminated, employing simple models of programs and testing to evaluate the impact on delivered reliability. They find that the superior method yields higher reliability after all failures are removed, though special cases favor each approach, and the distribution of delivered reliability exhibits unexpected statistical properties that warrant caution in interpreting theoretical comparisons.

Abstract

There are two main goals in testing software: (1) to achieve adequate quality (debug testing), where the objective is to probe the software for defects so that these can be removed, and (2) to assess existing quality (operational testing), where the objective is to gain confidence that the software is reliable. Debug methods tend to ignore random selection of test data from an operational profile, while for operational methods this selection is all-important. Debug methods are thought to be good at uncovering defects so that these can be repaired, but having done so they do not provide a technically defensible assessment of the reliability that results. On the other hand, operational methods provide accurate assessment, but may not be as useful for achieving reliability. This paper examines the relationship between the two testing goals, using a probabilistic analysis. We define simple models of programs and their testing, and try to answer the question of how to attain program reliability: is it better to test by probing for defects as in debug testing, or to assess reliability directly as in operational testing? Testing methods are compared in a model where program failures are detected and the software changed to eliminate them. The "better" method delivers higher reliability after all test failures have been eliminated. Special cases are exhibited in which each kind of testing is superior. An analysis of the distribution of the delivered reliability indicates that even simple models have unusual statistical properties, suggesting caution in interpreting theoretical comparisons.

References

Page 1

	Year	Citations

Page 1