Experimental Assessment of PCR Specificity and Copy Number for Reliable Data Retrieval in DNA Storage

Abstract

ABSTRACT Synthetic DNA has been gaining momentum as a potential storage medium for archival data storage 1–9 . Digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later individual file retrieval via PCR 7–9 ( Fig. 1a ). Using a previously presented encoding scheme 9 and new experiments, we demonstrate reliable file recovery when as few as 10 copies per sequence are stored, on average. This results in density of about 17 exabytes/g, nearly two orders of magnitude greater than prior work has shown 6 . Further, no prior work has experimentally demonstrated access to specific files in a pool more complex than approximately 10 6 unique DNA sequences 9 , leaving the issue of accurate file retrieval at high data density and complexity unexamined. Here, we demonstrate successful PCR random access using three files of varying sizes in a complex pool of over 10 10 unique sequences, with no evidence that we have begun to approach complexity limits. We further investigate the role of file size on successful data recovery, the effect of increasing sequencing coverage to aid file recovery, and whether DNA strands drop out of solution in a systematic manner. These findings substantiate the robustness of PCR as a random access mechanism in complex settings, and that the number of copies needed for data retrieval does not compromise density significantly.

References

Page 1

	Year	Citations

Page 1