Advancing Species Identification in Metagenomic Profiling with MAP2B

MAP2B is a metagenomic profiler that works differently from universal marker or whole genome-based methods for deciphering whole metagenome sequencing, which can significantly remove false identifications and generate highly accurate taxonomic profiling results.
Published in Microbiology
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

In recent years, microbiome research has emerged as a groundbreaking field with the potential to revolutionize various areas, including agriculture, environmental restoration, drug discovery, human health, etc. 'Who are they?' is one of the most important questions to start with in microbiome research. Fortunately, the advent of high-throughput sequencing has brought about a seismic shift in microbiome research. Gone are the days when we had to rely on laborious and time-consuming culturing methods to investigate microbial compositions. Nowadays, we can swiftly answer the 'who are they' question by analyzing massive amounts of sequencing data.

After obtaining those massive reads, the next critical step is to decode them and quantify the microbial composition. We have two options at this stage: assemble the reads first or take an assembly-free approach. The assembly-free approach, namely metagenomic profilers, will eventually be more beneficial for clinical and industrial applications. This belief stems from the explosion of microbial genome information in publicly available databases, which is the foundation for metagenomic profilers to operate.

 Despite their promise, current metagenomic profilers have limitations. A recent benchmarking study, CAMI2, revealed that no metagenomic profilers excelled in taxon identification and abundance estimation at the species level. Such a bottleneck faced by metagenomic profilers is largely due to their reliance on universal single-copy markers or whole microbial genomes as references. This often results in challenges like missing/indistinguishable markers in reference database construction or multi-alignment of short reads against conserved regions in read-alignment.

 To address these challenges, we focused on the Type IIB restriction system. It is well known that the endonucleases from the Type IIB restriction-modification systems differ from all other restriction enzymes. In particular, the Type IIB enzymes cleave DNA on both sides of their recognition at fixed positions to cut out the recognition site with iso-length DNA fragments. In a previous study, we demonstrated that Type IIB restriction sites are widely and randomly distributed along microbial genomes (Sun et al., Genome Biology, 2022). We discovered that species-specific Type IIB restriction endonuclease digestion sites (or IIB fragments/tags) far outnumber universal single copy markers and naturally avoid the multi-alignment problem. As a result, we developed MAP2B (MetAgenomic Profiler based on type IIB restriction site). This novel metagenomic profiler can effectively eliminate false positives and generate higher precision and more accurate taxonomic profiles from Whole Metagenome Sequencing (WMS) data.

 Our benchmarking exercises using simulation datasets with varying sequencing depths and species richness showcased MAP2B's superior performance over existing metagenomic profilers in species identification. Further tests using real WMS data from an ATCC mock community confirmed its superior precision against sequencing depth. Additionally, by leveraging WMS data from an Inflammatory Bowel Disease (IBD) cohort, we demonstrated that the taxonomic features identified by MAP2B could better discriminate IBD from healthy controls and predict metabolomic profiles.

Our previous study demonstrated that the decoding of whole metagenomic sequencing data is also hindered by confusion surrounding the concept of sequence abundance versus taxonomic abundance (Sun et al., Nature Methods, 2021). Here, sequence abundance refers to the proportion of DNA, whereas taxonomic abundance refers to the proportion of individuals/cells. We showcased compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. While most metagenomic profilers, such as Bracken and Kraken, offer sequence abundance, taxonomic abundance may be a more clinically and ecologically relevant parameter. Hence, there is a pressing need for metagenomic profilers that can generate taxonomic abundance. Notably, MAP2B is one of the few in its ability to produce taxonomic abundance, setting it apart from existing tools. MAP2B can be accessed at: https://github.com/sunzhengCDNM/MAP2B.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Microbiology
Life Sciences > Biological Sciences > Microbiology

Related Collections

With collections, you can get published faster and increase your visibility.

Biology of rare genetic disorders

This cross-journal Collection between Nature Communications, Communications Biology, npj Genomic Medicine and Scientific Reports brings together research articles that provide new insights into the biology of rare genetic disorders, also known as Mendelian or monogenic disorders.

Publishing Model: Open Access

Deadline: Oct 30, 2024

Carbon dioxide removal, capture and storage

In this cross-journal Collection, we bring together studies that address novel and existing carbon dioxide removal and carbon capture and storage methods and their potential for up-scaling, including critical questions of timing, location, and cost. We also welcome articles on methodologies that measure and verify the climate and environmental impact and explore public perceptions.

Publishing Model: Open Access

Deadline: Mar 22, 2025