Publication | Open Access
Dataset decay and the problem of sequential analyses on open datasets
75
Citations
34
References
2020
Year
Sequential AnalysesEngineeringData CurationData PreparationMultiple Statistical TestingLarge-scale DatasetsReproducible ResearchData ScienceData MiningSame DatasetData IntegrationCorrection ProceduresData ManagementStatisticsDataset DecayKnowledge DiscoveryResearch Data ManagementDataset CreationData SetStatistical InferenceOpen DatasetsComputational Reproducibility
Open data allows researchers to explore pre-existing datasets in new ways. However, if many researchers reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis testing on the same dataset by multiple researchers can inflate error rates. We go on to discuss a number of correction procedures that can reduce the number of false positives, and the challenges associated with these correction procedures.
| Year | Citations | |
|---|---|---|
Page 1
Page 1