Concepedia

Publication | Closed Access

Preserving Statistical Validity in Adaptive Data Analysis

260

Citations

35

References

2015

Year

Abstract

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.

References

YearCitations

1995

105.5K

2005

10.2K

2011

6.6K

2005

4.8K

2014

3.9K

2012

2.9K

2005

1.5K

2003

986

2014

945

2010

842

Page 1