The assessment of replication success based on relative effect size

TLDR

Replication studies are increasingly conducted to confirm original findings, yet no standard exists for assessing replication success, leading to diverse approaches. This paper refines a reverse‑Bayes method and introduces the golden level, a recalibrated criterion for replication success. The method links to the relative effect size, requiring a larger replication estimate than the original for borderline significant studies and allowing conditional power to be set when the replication sample is sufficiently large. Applying the golden level to four large projects improves inference by penalizing shrinkage, yielding higher power and controlled Type‑I error when the replication sample size is not smaller than the original.

Abstract

Replication studies are increasingly conducted in order to confirm original findings. However, there is no established standard how to assess replication success and in practice many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how this method is directly related to the relative effect size, the ratio of the replication to the original effect estimate. This perspective leads to a new proposal to recalibrate the assessment of replication success, the golden level. The recalibration ensures that for borderline significant original studies replication success can only be achieved if the replication effect estimate is larger than the original one. Conditional power for replication success can then take any desired value if the original study is significant and the replication sample size is large enough. Compared to the standard approach to require statistical significance of both the original and replication study, replication success at the golden level offers uniform gains in project power and controls the Type-I error rate if the replication sample size is not smaller than the original one. An application to data from four large replication projects shows that the new approach leads to more appropriate inferences, as it penalizes shrinkage of the replication estimate compared to the original one, while ensuring that both effect estimates are sufficiently convincing on their own.

References

Page 1

	Year	Citations

Page 1