Behind the Paper

Identifying synthetic genes and understanding their use in bioengineering

This "Behind the paper" blog post references: Kunjapur, A. M., Pfingstag, P. & Thompson, N. C. Gene synthesis allows biologists to source genes from farther away in the tree of life. Nat. Commun. 9, 4425 (2018). DOI: 10.1038/s41467-018-06798-7

Published in Bioengineering & Biotechnology

Oct 24, 2018

Aditya Kunjapur

Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware

Identifying synthetic genes and understanding their use in bioengineering

Like Be the first to like this

How do new tools influence innovation in research? Neil Thompson’s group at the MIT Sloan School of Management was interested in this question and considered the field of synthetic biology as a testbed for studying the influence of new tools. In the summer of 2014, they reached out to MIT labs participating in the Synthetic Biology Engineering Research Consortium (Synberc, now EBRC) in search of a tutor who could provide subject matter expertise. I was intrigued by what lessons synthetic biologists could potentially draw from innovation across other fields and expressed interest in that tutoring role. Our collaboration began as I covered topics ranging from the basics of the central dogma to genome editing tools such as CRISPR-Cas9. The influence of CRISPR on research innovation was interesting but perhaps too new, so we continued discussing other tools until we reached DNA synthesis and DNA sequencing.

DNA synthesis is considered the key enabling technology for the field of synthetic biology. Its cost has decreased by orders of magnitude during the last two decades, and as a result it has become a routine service used by academic labs across the world. While this has fostered the development of academic and commercial technologies across numerous industrial sectors, some communities are concerned about the reduced barriers to engineering organisms.

Neil observed that the clear decrease in cost during the last two decades could make for a rich economics-oriented manuscript on how this trend has affected innovation in synthetic biology. The ability to identify synthetic DNA sequences would be essential to conduct this kind of study. Yet, to our knowledge the technical literature in synthetic biology contained no strategy for identification of synthetic sequences. Moreover, the need to identify synthetic sequences and the engineered organisms that harbor them had never been greater. These realizations became the genesis of a related but separate line of inquiry that was better suited for a scientific publication.

In “Gene synthesis allows biologists to source genes from farther away in the tree of life”, we present a bird’s eye view on a trend enabled by affordable gene synthesis within the academic biological research community. First, we developed a robust classifier for natural or synthetic genes based on sequence alone. We had a sense that sequences from nature would be contained in a publicly available database and that synthetic sequences would need to be different, but we did not know in what ways and by how much. We used a combination of theory, simulation, and machine learning to arrive at a threshold of sequence percentage identity arising from use of the nucleotide basic local alignment search tool (BLASTn) against the RefSeq reference genomic collection. Philipp Pfingstag’s development and implementation of this simple classifier on a test set of 173 sequences compiled by me resulted a remarkable 97.7% accuracy. Encouraged by this result and outside interest in applying our method to biosurveillance, we applied the strategy to a larger sequence database to investigate whether synthetic sequences were being used differently than their natural counterparts.

We could not have performed this study without tremendous assistance from the Addgene plasmid repository, which provided us with a database that contained over 19,000 unique sequences. Equipped with this rich dataset, Philipp examined one of my pet hypotheses about whether gene synthesis was being used disproportionately for expressing heterologous genes in model organisms. As a metabolic engineer, I view evolutionarily distant genome and metagenome collections as rich treasure troves of biosynthetic clusters, genetic parts, and orthogonal tools. From my own experience I knew that amplification of these natural genes for subsequent expression in the model organism Escherichia coli presents the risk of failed expression due to codon usage and that genomic DNA templates could take a long time to obtain. While in graduate school, I switched over to ordering synthetic codon-optimized DNA sequences to address both concerns and because affordable and synthetic linear DNA fragments became commercially available. But what about the community at large?

It was exciting to observe that the average genetic distance between organisms that we defined as the “source” and “expression” organisms for individual gene sequences was significantly greater for synthetic genes than for natural genes. This underscores one of the effects that DNA synthesis is having on synthetic biology innovation while also highlighting why synthetic sequences are strong indicators of engineered organisms that efficiently exhibit non-native traits. We hope our classification strategy will be part of a suite of tools used to identify such organisms as DNA synthesis technology continues to be democratized.

Aditya Kunjapur

Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biotechnology

Life Sciences > Biological Sciences > Biotechnology

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Biosensing

With this cross-journal Collection, the editors of Communications Biology, Nature Biomedical Engineering, Nature Sensors, Nature Communications, and Scientific Reports welcome the submission of primary research Articles focusing on the development of engineered biosensing devices with the potential to be applied in biomedical research and in the management of disease conditions.

Publishing Model: Hybrid

Deadline: Jun 30, 2026

Explore this Collection

Call for papers: The expanding therapeutic landscape of GLP 1 receptor agonists

Behind the Paper

Neutron diffraction provides molecular insight into carbon capture solutions

News and Opinion

SDG 3 Newsletter: Infectious Diseases

Behind the Paper

Two tales of one hormone axis—how parasitic nematodes exploit a conserved developmental switch to survive and thrive

Behind the Paper

How gut bacteria team up to heal the gut – a story of two papers

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Identifying synthetic genes and understanding their use in bioengineering

Share this post

Share with...

...or copy the link