Scientists Just Mastered an Error-Free Way to Store Data on DNA

The new data storage method beat out old approaches by 60 percent.

Share on
BY Wanda Thibodeaux - 07 Mar 2017

PHOTO CREDIT: Getty Images

Ask any business what it takes to really get ahead and data analysis comes pretty close to topping the list. The stink in the commode, though, is that companies have so much data that just storing it--let alone putting it to use--is problematic. But if scientists have their way, in the not-so-distant future, you very well could use deoxyribonucleic acid (DNA) as a storage medium. As Robert Service of Science reports, researchers say they've created a new method to encode digital data onto DNA that's more efficient and accurate than any other process used until now.


Why DNA is so appealing

Service notes several reasons why scientists are eying DNA as a viable data storage choice:

  • DNA is ultracompact
  • It won't degrade over time or become obsolete
  • New technology allows for the reading and writing of large amounts of DNA at a time, making it possible to scale it up

Because of these benefits, researchers have been working with DNA for data storage since 2012. But none have been able to store more than half of what researchers believe actually is possible (1.8 bits of data per nucleotide of DNA).


The most recent success

Yaniv Erlich, computer scientist at Columbia University, partnered with Dina Zielinski , associate scientist at the New York Genome Center. To get data onto DNA and retrieve it in a more efficient, less error-prone way, they completed the following steps:

• Converted six files into binary strings of 1s and 0s

• Compressed the files into a master file

• Split the data into short strings of binary code

• Came up with a new algorithm called "DNA fountain"

• Used the algorithm to package the strings of binary code into "droplets"

• Added additional tags to help reassemble the strings of code in the right order

• Sent the code as text files to Twist Bioscience in San Francisco, California to synthesize the DNA strands

• Used a computer to translate the genetic code back into binary, using the tags to reassemble the original files

The results, announced earlier this week, were outstanding, encoding 1.6 bits of data per nucleotide (85 percent what scientists think is the maximum) and exceeding previous attempts by other scientists by 60 percent. There were no errors, and through polymerase chain reaction, a modern technique people already use to copy DNA, Erlich and Zielinski were able to replicate the files without issue.


For now, archiving--for the future, the unknown

The price tag for Erlich and Zielinski's process was $9,000. And that's just for six measly files. Imagine the cost for all the files we've ever created, or the cost of the files people will create just today alone. In short, we're not nearly to the point where the technique would be financially prudent for companies or individuals. And writing and reading to DNA is still painfully slow, according to Erlich. So even if you could afford to use it right now, it's an archiving tool at best until technology streamlines the coding and decoding process. But those advances will happen. And when companies already are using artificial intelligence, robots and bionics, the line between natural and large-scale artificial learning might be closer than we think.