Data storage in DNA was promising as far back as 2007 and years later it turned out to be of interest at a more widespread level, even on the part of Microsoft, who have developed a machine to automate the process. This interest is far from being mitigated and now a team of researchers has devised a new process to store and retrieve information in DNA.

The fact of trying to find a practical (and above all economic) system to store information in this biomolecule is part of the advantages it represents: information takes up less space and promises very long durability. Another obstacle is in the recovery of stored files, which is not quite as simple and practical as they would like for now, and that is where this new technique could bring improvements.

Fluorescence and Boolean logic to find files earlier

The idea of ​​storing data in DNA is based on encode the information bits (zeros and ones) in DNA sequences. DNA (deoxyribonucleic acid) is a chain of small units called nucleotides that are identified by one of its parts (the nitrogenous base), by which it is given a letter (which is the initial of the differential component of the nitrogenous base): A (adenine), T (thymine), C (cytosine), and G (guanine). That is, each link would be one of these letters, and the order of them determines the genes and which protein is expressed, hence the genetic code.

Schematic organization and compaction of DNA in a cell. Image: KES47

Hence, broadly speaking, it is a matter of translating the zeros and ones of the binary into A, T, C and G, and thus take advantage of the fact that more or less one nucleotide equals two bits. This translates into two bits occupying roughly 1 cubic nanometer, thus an exabyte of data (for example, from a server center) in DNA it would more or less fit in the palm of our hand.

The method is promising, but there are those two points that we reflected as main bottlenecks: the cost and retrieve a specific file among all those that have been stored. At the moment the method is similar to the one carried out by the cellular machinery (it is considered that), since it is based on the recognition of a marker (primer) so that the desired sequence is found and amplified, but this leads to confusion between the first and other sequences from other files and, as explained at MIT, it is a process that consumes much of the DNA in the sample (and requires many enzymes).

In this sense, what these MIT researchers have published in Nature Materials is a new technique to recover files in a more practical way encapsulating DNA in silica particles. Each of these particles is labeled with a barcode-like DNA strip that corresponds to the content of the file.

Researchers encoded 20 images in DNA molecules of about 3,000 nucleotides, which is equivalent to about 100 bytes (although the capsules would manage to store up to 1 GB, as described). Each file was tagged with some clear reference, such as “cat” or “airplane”, so that the primers of those references were used.

These primers were labeled with fluorescence or magnetic particles, so that it was easier to identify matches with the desired DNA sequence. What it allows, as they explain, is that the desired DNA fragment (and therefore the archive) is recovered without damaging the rest of the DNA. In addition, it allows do a search in a similar way to how the Google search engine works with images by supporting boolean operators, so that a formula like “animal AND white” generates “cat” as the result.

A small step towards promising, but still expensive storage

Another advantage of this new technique is that current instruments and techniques can be used laboratory for sequencing, amplifying, etc. the DNA. Mark Bathe, one of the researchers on the project, sees usefulness in the future for information that is not accessed on a regular basis.

Nevertheless, cost is still an issue. According to MIT, it would currently cost $ 1 trillion to store a petabyte of data (1 million GB), and Bathe calculates it will not be for at least a decade (or two) when it would begin to be cost competitive with it. magnetic storage.

So it remains to be seen if, while the processes get cheaper as they calculate, they are also finding the most efficient systems also at the level of saving speed. Meanwhile, what we do know for sure is that GIFs can be stored in DNA, so the GIF culture is saved.

Image | Vectorjuice (Freepik