The following is a special contribution to this blog by CCC Executive Council Member Mark D. Hill of the University of Wisconsin-Madison. Full disclosure: He is working with one of the authors—Luis Ceze—and Tom Wenisch on visioning via Architecture 2030 at ISCA 2016.
The invention of writing enabled us to reliably transmit information into the future. Stone tablets, papyrus, velum, and paper can be read centuries if not millennia later. But how much of the digital information that we created over the last 75 years will be readable much later? How much is even readable now?
Wouldn’t it be valuable if we could record digital information in a medium that will last centuries and which we have incentive to always be able to read? Even better would be a medium that permits dense, high volume storage with reasonable access time. Magnetic tape and optic disks can last decades to a century, but they are not dense enough for truly massive data, and they quickly become obsolete and need to be rewritten.
Researchers at Washington and Microsoft Research have taken a step in the direction in a paper presented at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) titled A DNA-Based Archival Storage System. The medium they propose is DNA! DNA lasts—researchers have read 40,000-year-old Neanderthal DNA—and, as the code of life, we have great incentive to remember how to read it. It is digital—each nucleotide can in theory encode two bits selecting one of adenine (A), cytosine (C), guanine (G), and thymine (T). But there are challenges.
First, one must be able to reliably write DNA. Fortunately, the biotech industry has develop the basic tools for de-novo DNA synthesis. Still, they need to be scaled by several orders of magnitude before DNA storage becomes viable.
Second, data must be encoded with more redundancy than two-bits per nucleotide due to relatively high raw error rates in DNA writing and ready. But luckily computer scientists are pretty good at coding. Indeed the results presented at ASPLOS retrieved all data stored bit by bit, despite high raw error rates in the DNA write and read process.
Third, data must be read from DNA. DNA sequencing has been improving very fast — 10,000X performance improvement in the past decade! If it continues at this pace, it will soon be fast enough for storage.
In summary, while it may sound like science fiction, progress in DNA-based data storage has been rapid and if it succeeds, it may replace tape as archival technology.
DNA storage might just be the first step towards building computers using components from biology.