Behind the Paper

Beyond ATGC: enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA to unlock xenobiology's potential

How we built tools to read and write DNA containing 12-letters (A, T, G, C, B, S, P, Z, X, K, J, V)

Published in Bioengineering & Biotechnology

Oct 30, 2023

Jorge Marchand and Hinako Kawabe

2 contributors

Beyond ATGC: enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA to unlock xenobiology's potential

Liked by Tsuyoshi Yamamoto

Explore the Research

Challenging the rules of Nature with unnatural base pairing xenonucleic acids (XNAs)

The 4-letter DNA code Nature utilizes (A, T, G, C) is the backbone of the central dogma and the very blueprint of life itself. Our ability to manipulate this code is also the driver of biotechnological progress; everything from genetically engineered organisms to therapeutics, diagnostics, and information storage relies on this 4-letter code. Yet, as biotechnology advances, science is beginning to transcend these rules set by Nature. Unnatural base pairing Xenonucleic acids (ubp XNAs) are synthetic, nucleotide analogs that can be used orthogonally to the 4 letters in DNA. While there are many types of ubp XNAs, one such variation involves up to 12 different nucleotides that can form 6 complementary hydrogen-bonding pairs. We term this nucleic alphabet soup ‘supernumerary DNA’ (Figure 1).

Ubp XNAs have the potential to revolutionize every aspect of biotechnology, from semi-synthetic organisms that have unimaginably large codons tables (1,728 codons with 12 letters rather than 64 codons with ATGC-only); to aptamer/aptazyme therapeutics that have novel binding modalities or reactivities; to ultrasensitive diagnostics; and even be the basis for storing digital information (i.e., files, movies, books) in DNA.

When we first opened our lab in January 2021, we were captivated by the possibilities of a world where working with arbitrary DNA alphabets were routine. However, we quickly realized that the abundant and crucial toolkits for 4-letter DNA technologies were not available for expanded DNA alphabets. As a new research group with limited resources, we hit our first road block in writing with these new letters: commercial options for XNA synthesis were limited and expensive (a few hundred dollars per base). The second hurdle we encountered came from reading. Commercial sequencing services and instruments could not sequence the ubp XNA letters we wanted to work with (B, S, P, Z, X, K, J, V). At this point, we connected with like-minded collaborators who also encountered similar problems. If we wanted realize this world with XNA aptamer therapeutics and XNA semi-synthetic organisms with an ATGCBS genome, we first had to lower these barriers to enter the field. We decided to focus our group’s first project on reading and writing with expanded letters.

12-letter DNA — **Fig. 1.** **Nucleobases for a 12-letter supernumerary DNA alphabet**. (a) Structures of standard purine and pyrimidine nucleobases found in life. (b) Structures of xenonucleobases that could form the basis of 12-letter supernumerary DNA. Arrows indicate hydrogen bonding between base pairs, drawn from donor to acceptor. The S nucleobase has two possible structures that both base pair with B: the N-nucleoside (Sⁿ) and C-nucleoside (S^c).

Enzymatic synthesis of XNAs

Our lab initially set out to find a synthesis solution that was applicable across all ubp XNAs we were working with, and accessible in both reagents and expertise. While chemical synthesis (e.g., using phosphoramidites) seems like a general solution, limited chemical stability had made routine organic synthesis challenging. Keeping these factors in mind, we found ourselves turning to an enzymatic synthesis solution using an old discovery, made over 35 years ago. Back then, scientists revealed that a fragment of E. coli DNA Polymerase I (Klenow Fragment exo-) could catalyze the addition of a single 2′-deoxynucleoside triphosphate (dNTP) to the free 3′-OH end of blunt-end double stranded DNA. This reaction, colloquially known as tailing, has an advantage over other enzymatic synthesis approaches as it favors a single base addition and does not require modified nucleotides or polymerase-nucleotide conjugates. We suspected that if a polymerase can tail all four standard dNTPs, it should be able to tail 2′-deoxy-xenonucleoside triphosphates (dxNTPs) as well (Figure 2). Our experiments confirmed this notion: a combination of analytical techniques including gel electrophoresis and high resolution liquid chromatography-mass spectrometry (LCMS) assays, we found that two polymerases (KF exo- and Therminator) were capable of tailing a single dxNTP onto blunt, dsDNA. Since tailing only incorporates one expanded letter, we needed to couple this reaction with a ligation step to make a true unnatural base pair. To incorporate a base pair, we next screened commercially available DNA ligases to ligate two hairpins with complementary ubp XNA overhangs. The hairpin construct becomes crucial here: unligated hairpins can be digested by a subsequent exonuclease step, which cannot target successfully ligated products as they lack a free 5′- and 3′-end. With these two steps, XNA tailing and XNA ligation, we now had a method for single XNA base pair insertion into DNA.

Tailing reaction — **Fig. 2.** **XNA tailing.** Overview of an untemplated, singular xenonucleotide addition reaction catalyzed by certain polymerases.

Nanopore sequencing of supernumerary DNA

Having successfully developed a method for single XNA insertion in DNA, we turned our attention to the other towering barrier: how to read these letters (or sequencing). Here, we focused on adapting existing technology rather than inventing a new one. We chose to use nanopore sequencing since this method is theoretically does not require special fluorophore-labeled bases to sequence. In nanopore sequencing, a voltage is applied across the membrane leading negatively charged DNA to traverse a pore, generating a small but measurable current signal. This current output as a function of the chemical structures of the nucleotides going through the pore. Oxford Nanopore Technologies (ONT) has made this platform accessible with their MinION sequencer, and with no need for modified nucleotides or additional equipment other than a computer, the missing link was model that could assign current signals to the correct XNA-containing sequences.

To fill this missing piece, we built XNA “kmer models”. In the kmer model of basecalling, the DNA current is only a function of the nucleotide going through the pore and its surrounding nucleotide context. Using XNA tailing and ligation, we built libraries that produced every 4-nt kmer possible containing an XNA; sequencing these libraries allowed us to build models that assign current signals to sequences, and subsequently basecall new sequences. To make all of our models accessible we also built Xenomorph, a reference-based basecaller available on Github that contains all of our measured models, which can perform end-to-end processing from raw nanopore data to basecalled results. We used Xenomorph to benchmark a validation set separate from our model building sequences and found that recall ranged between 60-87% when comparing each XNA to its most similar standard base; a consensus basecall of at least 10 reads increased this recall to 63-99%. What made this model building strategy most attractive was their low data requirement, meaning we could build models quickly and efficiently. With increased investment in data collection, including increasing complexity of libraries or increasing kmer size, we see a reasonable avenue for improving sequencing performance.

As an interesting and fun experiment to wrap up this project – we pushed our findings to their alphabetical limit. The inherent compatibility of XNA tailing and XNA ligation strategy with other DNA assembly strategies meant we could write, for the first time, 12-letter DNA. More so, the modularity of our sequencing models meant that we could apply them to also read 12-letter DNA. To do this, multiple ubp XNA-containing constructs underwent Golden Gate ligation to assemble two constructs that contained 4 standard letters (ATGC) as well as the additional 8 ubp XNAs (BSPZXKJV). We had built two version of this 12-letter supernumerary DNA sequences since we had two version of the S nucleotides (C-nucleoside and N-nucleosides). We named these sequences S^cuper-12 and Sⁿuper-12. Even in this complicated sequence space where each XNA was being compared against 11 other letters, we were able to properly decode all XNAs in S^cuper-12. In Sⁿuper-12, only Kⁿ was incorrectly decoded, but this decoding error is easily resolved with additional priors.

The future of ubp XNAs, xenobiotechnology, and xenobiology

Altogether, this work establishes synthesis and sequencing methods that significantly lower the barrier to access XNAs in synthetic biology and beyond. In the advanced world of ATGC-based technologies, we understand the limitations of our work, including the fact that our basecaller is for sequence contexts where an XNA is embedded within a standard DNA context. However, we are hopeful that by laying down the first steppingstone, this work will encourage us and other groups to catalyze future XNA synthesis and sequencing innovations. In the immediate future, the methods we’ve developed can be used to study XNA retention in vivo, work with an expanded genetic code, develop aptamers with novel functions, and more. While XNAs are currently not as widely adapted as DNA, we’ve taken one step closer to a xenobiology world.

Multiple Contributors

Jorge Marchand and Hinako Kawabe

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biotechnology

Life Sciences > Biological Sciences > Biotechnology

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Healthy Aging

This collection welcomes submissions based on studying preclinical models, as well as population-wide and clinical studies. Studies that advance our understanding of mechanisms behind healthy aging are also welcomed. Clinical research of interest will include epidemiological studies, observational studies, longitudinal cohort studies, systematic reviews and clinical trials.

Publishing Model: Open Access

Deadline: Dec 31, 2026

Explore this Collection

Is there something special about Sourdough bread?

Behind the Paper

The Climate Keeps Receipts: How Temporary Temperature Overshoots Leave Lasting Impacts

Behind the Paper

Bi‑Functional Extension on Heterogeneous ORR/OER Catalysis with 2D Materials for Li‑O2 Batteries

Behind the Paper

Could Your Olive Oil Be Feeding Your Brain — Through Your Gut? Celebrating World Microbiome Day with a Story from the Mediterranean

Behind the Paper

What does a mother's gut microbiota tell the placenta?

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Beyond ATGC: enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA to unlock xenobiology's potential

Share this post

Share with...

...or copy the link