Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects

TLDR

Early childhood educational investments are widely believed to yield the highest returns, a belief largely based on the Abecedarian, Perry, and Early Training Project trials that report exceptionally high gains. This study reexamines those trials to investigate gender‑specific treatment effects and the impact of multiple inference on statistical conclusions. The authors apply a statistical framework that combines summary‑index tests with family‑wise error rate and false‑discovery‑rate corrections, first reducing the number of tests and then adjusting p‑values for multiple comparisons. The reanalysis shows that girls receive substantial short‑ and long‑term benefits from the interventions, while boys show no significant long‑term gains, underscoring the importance of accounting for multiple testing in complex studies.

Abstract

The view that the returns to educational investments are highest for early childhood interventions is widely held and stems primarily from several influential randomized trials—Abecedarian, Perry, and the Early Training Project—that point to super-normal returns to early interventions. This article presents a de novo analysis of these experiments, focusing on two core issues that have received limited attention in previous analyses: treatment effect heterogeneity by gender and overrejection of the null hypothesis due to multiple inference. To address the latter issue, a statistical framework that combines summary index tests with familywise error rate and false discovery rate corrections is implemented. The first technique reduces the number of tests conducted; the latter two techniques adjust the p values for multiple inference. The primary finding of the reanalysis is that girls garnered substantial short- and long-term benefits from the interventions, but there were no significant long-term benefits for boys. These conclusions, which have appeared ambiguous when using "naive" estimators that fail to adjust for multiple testing, contribute to a growing literature on the emerging female–male academic achievement gap. They also demonstrate that in complex studies where multiple questions are asked of the same data set, it can be important to declare the family of tests under consideration and to either consolidate measures or report adjusted and unadjusted p values.

References

Page 1

	Year	Citations

Page 1