Publication | Closed Access
The Promise and Limitations of Synthetic Data as a Strategy to Expand Access to State-Level Multi-Agency Longitudinal Data
41
Citations
36
References
2019
Year
EngineeringData CurationExpand AccessWell-developed Synthetic DatasetsData ScienceManagementData IntegrationSynthetic Data SystemData ManagementStatisticsData CreationData ModelingLongitudinal Data AnalysisData PrivacyComputer ScienceSynthetic DataData TreatmentData LiteracyData HeterogeneityHealth InformaticsBig Data
There is demand among policy-makers for the use of state education longitudinal data systems, yet laws and policies regulating data disclosure limit access to such data, and security concerns and risks remain high. Well-developed synthetic datasets that statistically mimic the relations among the variables in the data from which they were derived, but which contain no records that represent actual persons, present a viable solution to these laws, policies, concerns, and risks. We present a case study in the development of a synthetic data system and highlight potential applications of synthetic data. We begin with an overview of synthetic data, what it is, how it has been utilized thus far, and the potential benefits and concerns in its application to education data systems. We then describe our federally-funded project, proposing the steps required to synthesize a statewide longitudinal data system covering high school, postsecondary, and workforce data. Last, for use as a template for other agencies considering synthetic data, we review the challenges we have confronted in the development of our synthetic data system for research and policy evaluation purposes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1