Concepedia

Publication | Open Access

Generating Electronic Health Records with Multiple Data Types and\n Constraints

30

Citations

6

References

2020

Year

Abstract

Sharing electronic health records (EHRs) on a large scale may lead to privacy\nintrusions. Recent research has shown that risks may be mitigated by simulating\nEHRs through generative adversarial network (GAN) frameworks. Yet the methods\ndeveloped to date are limited because they 1) focus on generating data of a\nsingle type (e.g., diagnosis codes), neglecting other data types (e.g.,\ndemographics, procedures or vital signs) and 2) do not represent constraints\nbetween features. In this paper, we introduce a method to simulate EHRs\ncomposed of multiple data types by 1) refining the GAN model, 2) accounting for\nfeature constraints, and 3) incorporating key utility measures for such\ngeneration tasks. Our analysis with over $770,000$ EHRs from Vanderbilt\nUniversity Medical Center demonstrates that the new model achieves higher\nperformance in terms of retaining basic statistics, cross-feature correlations,\nlatent structural properties, feature constraints and associated patterns from\nreal data, without sacrificing privacy.\n

References

YearCitations

Page 1