The past few years have been an exciting time for computational biologists. With the revolution of artificial intelligence (AI) finally catching up to the problems in biology and conquering them one after another, we have seen ground-breaking work in different fields, especially protein design and engineering. In 2021, before such fruition when we had just started this project, major tools in the field of peptide/protein generation were variational autoencoders (VAEs), generative adversarial networks (GANs) and derivatives of recurrent neural networks (RNN) such as long short-term memory (LSTM) and gated recurrent units (GRU). At the time transformers and the diffusion mechanism were yet to be proven viable with experimental validation, which did not take long 1,2. A paper by Das et al.3 had set the scene for small antimicrobial peptide (AMP) generation with deep learning models followed by experimental validation, yet there was a limiting factor. The chemical synthesis of AI-designed AMP candidates pose a limit on the number of tested peptides due to the cost and time of production.
We sought to improve the throughput of AMP screening with a method we had previous experience with; cell-free protein synthesis (CFPS). We started by experimenting with different deep learning models for generation and prioritization of AMPs. We soon realized the abundance of VAE models in literature due to their potential in both the generation capabilities and the interesting characteristics of the continuous latent space embedding of the protein language space. This space can be utilized for optimization algorithms and searching for similar peptides to a sample.
We first trained a VAE model on known AMPs and had the model generate some sequences; we also trained regressor models to predict the antimicrobial activity as minimum inhibitory concentration (MIC). We selected the top 100 peptides from the generated dataset by their predicted MIC and ordered DNA fragments encoding in the CFPS system. In the first round of the experiment, we observed two peptides that inhibited bacterial growth. On further analysis, the simulations (Manish Kushwaha, INRAe France) and SDS-PAGE results of the cell-free reactions showed the peptides were successfully produced. This early result convinced us of the viability of this approach. We then decided to improve the models in the hope of more. For instance, we observed an improvement in the computational metrics when the regressor neural network models were provided with a dataset of peptides proven not to be active against bacteria (nonAMPs). Another computational method that proved to be important was pre-training of the generator models with a large corpus of protein samples regardless of function or family. During this procedure, the model can learn general properties about protein language, and in the fine-tuning stage of this model with AMPs, the model can learn properties specific to AMPs.
In total, we experimentally tested 500 AMPs and found 30 functional candidates, six of which showed broad-spectrum activity against drug resistant bacteria and did not evolve bacterial resistance. We continued to collaboratively characterize these AMPs, as described in the paper, with molecular dynamics simulations by Hummer lab (Max Planck Institute for Biophysics), peptide synthesis by Vázquez lab (University of Marburg) and bioactivity characterization by D. Adam, P. Braun and H. von Buttlar (Bundeswehr Institute of Microbiology), Schmeck lab (iLung Institute) Vázquez and Pogge von Strandmann labs (University of Marburg), and Bode lab (Max Planck Institute for Terrestrial Microbiology). The fruitful collaboration with students and researchers in these labs helped us altogether prepare this paper.
Illustration credit: Elizaveta Bobkova (Max Planck Institute for Terrestrial Microbiology).
- Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
- Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
- Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat Biomed Eng. 5, 613–623 (2021).