Publication | Closed Access
Preserving Statistical Validity in Adaptive Data Analysis
260
Citations
35
References
2015
Year
Unknown Venue
Discovery TechniqueEngineeringStatistical ValidityData ScienceData MiningStatistical MethodsData ValidationKnowledge DiscoveryData TreatmentData PreparationSpurious Scientific DiscoveriesStatistical InferenceFalse Discovery RateComputational ReproducibilityFunctional Data AnalysisStatisticsReproducible ResearchMethodological Advance
A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.
| Year | Citations | |
|---|---|---|
1995 | 105.5K | |
2005 | 10.2K | |
2011 | 6.6K | |
2005 | 4.8K | |
2014 | 3.9K | |
2012 | 2.9K | |
2005 | 1.5K | |
2003 | 986 | |
2014 | 945 | |
2010 | 842 |
Page 1
Page 1