Deep learning strategies for toehold switch design

Published in Bioengineering & Biotechnology

Oct 13, 2020

Diogo Camacho, Katie Collins & Jacqueline Valeri

3 contributors

Deep learning strategies for toehold switch design

Like Be the first to like this

Explore the Research

Toehold switches¹ are versatile engineered riboregulators developed for a myriad of applications such as pathogen-sensing diagnostic tools² or components in synthetic gene circuits. These riboregulators can easily transition between two stable states, depending on the presence of a target nucleotide sequence. If this target sequence is present, the toehold undergoes a secondary structure change that will allow for the expression of a reporter gene, which results in a colorimetric or fluorometric readout.

However, the development of new toehold regulators is more often than not highly time-consuming and uncertain. The screening and fine-tuning of a single switch can take weeks from ideation to final design, and researchers might have to order tens or hundreds of sequences to find one or two that will have the desired output. As such, we hypothesized that we could couple big data and biological insight to more reliably automate the prediction and design of toehold switches³. The first challenge was to generate a large dataset that would be used to power our predictive models. Our team partnered with Angenent-Mari et al.⁴ and used simple bioinformatics techniques for sequence analyses in order to generate a comprehensive set of potential toeholds, which were then experimentally tested in their lab. With this dataset in hand, both teams worked on two orthogonal projects that share one common thread: using deep learning to characterize toehold switches in silico.

Our team approached this challenge by designing and implementing different machine learning procedures. On one hand, we used a deep learning architecture based on convolutional neural networks, which borrows a variety of concepts from computer vision and image analyses. On the other hand, we treated the toehold sequences as part of a common DNA/RNA language: based on approaches from natural language processing (NLP), we developed an architecture that uses a quasi-recurrent neural network and a tokenized input sequence to represent k-mers as ‘words’ in the toehold ‘sentence’. Both of our models offered distinct advantages, like divergent visualization techniques. In order to understand what both types of models were actually ‘learning’, we tested white-box approaches, in particular attention maps and in silico mutagenesis. These methods allowed us to discover biologically-relevant insights such as determining that the 6 to 9 nucleotides around the ribosome binding site are critical for both models’ predictions.

Following these exciting results, we performed a data ablation experiment, in which we trained each of our models with reduced training data. This experiment allowed us to elucidate the minimal number of toehold sequences that would be needed for effective training, as measured by model accuracy. Encouragingly, we found that these architectures were still accurate when trained with an order of magnitude less data! Emboldened by the models’ flexibility to use less data, and aiming to improve the models’ generalizability, we used transfer learning techniques to fine-tune model weights on a set of 168 toeholds tested in a different experimental context¹. We were ecstatic when we observed that our language model classified with 100% accuracy a set of 24 manually-designed sensors for Zika, as tested by Pardee et al.² As these models showed an incredible predictive power for the design of novel pathogen sensors, we deployed both models in an integrated design pipeline. We built two frameworks to optimize sequences, NuSpeak and STORM, where NuSpeak constructs toeholds that retain complementarity to 21 nucleotides of the 30-nucleotide target, while STORM allows for all 30 nucleotides to vary simultaneously. We experimentally validated our predictions and were encouraged to find strong agreement between in silico and in vitro results.

We hope our manuscript carries on the important work of the original 2014 paper by Green et al.¹ for automating toehold prediction and optimization, steps that would take weeks in labs but just a few minutes using our computational framework. We believe the white-box results, deep learning models, and transfer learning approaches presented in our work will be useful and impactful for the synthetic biology community, as new and creative tools continue to be developed for toeholds sensors.

References

Green, A. A., Silver, P. A., Collins, J. J. & Yin, P. Toehold Switches: De-Novo-Designed Regulators of Gene Expression. Cell 159, 925–939 (2014).
Pardee, K. et al. Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell165, 1255–1266 (2016).
Jacqueline A. Valeri et al. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat. Commun. (2020) doi:ttps://doi.org/10.1038/s41467-020-18676-2.
Angenent-Mari, N., Garruss, A. & Soenksen, L. A deep learning approach to programmable RNA switches. (2020).

Multiple Contributors

Diogo Camacho, Jacqueline Valeri & Katie Collins

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biotechnology

Life Sciences > Biological Sciences > Biotechnology

Nature

Nature

A weekly international journal publishing the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions.

More about the journal

Latest Content

Comprehensive risk profiling of occupational harmful factors in the ceramic industry: a case study from Iran

How to select the best candidate or the key factors? Hierarchical topological clustering can help

REM-related obstructive sleep apnoea in neuromuscular diseases: A 10-year retrospective cohort study

Advanced Remediation of Toxic Materials Using Zero-Valent Iron Nanoparticles: A Comprehensive Review"

Invasive bacteriophages between a bell and a hammer

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Deep learning strategies for toehold switch design

Share this post

Share with...

...or copy the link