Unlocking Nature's Secrets: How AI is Revolutionizing Natural Product Discovery

Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. AI now provides us new ways of designing them.
Unlocking Nature's Secrets: How AI is Revolutionizing Natural Product Discovery

As I sit here, reflecting on the journey that led to the automated generation of over 67 million natural product-like molecules, I can't help but feel a sense of awe and excitement. This endeavour, driven by the power of artificial intelligence (AI) and deep generative models, has the potential to transform the way we explore and harness the wonders of nature for the betterment of society.

Nature is an extraordinary source of diverse and bioactive compounds that have the power to revolutionize fields such as medicine, agriculture, and food production. From ancient times, humans have been aware of the healing properties of plants and the unique chemistry they possess. In fact, many of our most effective antibiotics can be traced back to natural products. However, the process of discovering and harnessing these compounds has been slow and resource-intensive, often leading to limited success.

How can AI help us explore Nature?

That is where our work comes in. We embarked on a mission to leverage the power of AI and deep generative models to explore the vast chemical space of natural products in a high-throughput and cost-effective manner. By training a recurrent neural network on known natural products, we were able to generate a staggering 165-fold expansion over the known natural product space, reaching over 67 million compounds.

Workflow to generate natural products using AI
Workflow to generate natural products using AI

The motivation behind this work stemmed from the realization that traditional methods of natural product discovery were reaching their limits. The laborious and expensive process of manually curating and characterizing natural product libraries was a significant barrier to progress. The scientific community needed a breakthrough, a way to explore the uncharted territories of natural product chemical space efficiently and comprehensively.

How can deep generative models help?

Inspiration struck when we saw the potential of deep generative models. These AI-driven architectures have the unique ability to transcend human-dependent design and significantly expand the chemical search space. Variational autoencoders, recurrent neural networks, and generative adversarial networks became our tools of choice. Among them, the SMILES-based recurrent neural network with long short-term memory (LSTM) units emerged as the most suitable for our purposes. It demonstrated an impressive capability to generate novel and diverse molecules, even with limited training data.

Our approach was straightforward yet powerful. We trained the LSTM model on a vast collection of known natural products, enabling it to understand the molecular language of nature and learn how to assemble SMILES-based tokens into unique and natural product-like SMILES. We first generated a massive database of 100 million compounds, before eliminating invalid and duplicate compounds. The subsequent steps of curation, standardization, and analysis using cheminformatics toolkits refined the database to a robust collection of 67 million validated, unique, and natural product-like molecules.

Expansion of natural chemical space using generative AI
Expansion of natural chemical space using generative AI

 What does this mean for society?

The impact of this innovation is multi-faceted and far-reaching. Firstly, the sheer expansion of the natural product library by 165-fold opens up uncharted territories of chemical space. The vast number of molecules generated provides a wealth of potential candidates for exploration, offering researchers a goldmine of bioactive compounds waiting to be discovered.

Moreover, our approach is a game-changer in terms of cost and efficiency. The time and resources required for traditional natural product discovery are significantly reduced. Our entire training and sampling process took less than 24 hours, using readily available computational resources. In contrast, commercially available natural product libraries can cost tens of thousands of dollars, making them inaccessible to many researchers. Our innovation democratizes access to a wealth of natural product-like molecules and empowers scientists across the globe to embark on transformative research.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Physical Sciences > Chemistry

Related Collections

With collections, you can get published faster and increase your visibility.

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023

Meteorology and hydroclimate observations and models

This Collection presents a series of articles describing hydroclimate datasets, including data sourced from remote sensing, primary measurements or theoretical models. Datasets are presented without analyses in order to support policy development and further research, with Data Descriptors providing full details of data sources, modelling, and any associated code.

Publishing Model: Open Access

Deadline: Dec 15, 2023