Advancing Species Identification in Metagenomic Profiling with MAP2B

MAP2B is a metagenomic profiler that works differently from universal marker or whole genome-based methods for deciphering whole metagenome sequencing, which can significantly remove false identifications and generate highly accurate taxonomic profiling results.
Like

In recent years, microbiome research has emerged as a groundbreaking field with the potential to revolutionize various areas, including agriculture, environmental restoration, drug discovery, human health, etc. 'Who are they?' is one of the most important questions to start with in microbiome research. Fortunately, the advent of high-throughput sequencing has brought about a seismic shift in microbiome research. Gone are the days when we had to rely on laborious and time-consuming culturing methods to investigate microbial compositions. Nowadays, we can swiftly answer the 'who are they' question by analyzing massive amounts of sequencing data.

After obtaining those massive reads, the next critical step is to decode them and quantify the microbial composition. We have two options at this stage: assemble the reads first or take an assembly-free approach. The assembly-free approach, namely metagenomic profilers, will eventually be more beneficial for clinical and industrial applications. This belief stems from the explosion of microbial genome information in publicly available databases, which is the foundation for metagenomic profilers to operate.

 Despite their promise, current metagenomic profilers have limitations. A recent benchmarking study, CAMI2, revealed that no metagenomic profilers excelled in taxon identification and abundance estimation at the species level. Such a bottleneck faced by metagenomic profilers is largely due to their reliance on universal single-copy markers or whole microbial genomes as references. This often results in challenges like missing/indistinguishable markers in reference database construction or multi-alignment of short reads against conserved regions in read-alignment.

 To address these challenges, we focused on the Type IIB restriction system. It is well known that the endonucleases from the Type IIB restriction-modification systems differ from all other restriction enzymes. In particular, the Type IIB enzymes cleave DNA on both sides of their recognition at fixed positions to cut out the recognition site with iso-length DNA fragments. In a previous study, we demonstrated that Type IIB restriction sites are widely and randomly distributed along microbial genomes (Sun et al., Genome Biology, 2022). We discovered that species-specific Type IIB restriction endonuclease digestion sites (or IIB fragments/tags) far outnumber universal single copy markers and naturally avoid the multi-alignment problem. As a result, we developed MAP2B (MetAgenomic Profiler based on type IIB restriction site). This novel metagenomic profiler can effectively eliminate false positives and generate higher precision and more accurate taxonomic profiles from Whole Metagenome Sequencing (WMS) data.

 Our benchmarking exercises using simulation datasets with varying sequencing depths and species richness showcased MAP2B's superior performance over existing metagenomic profilers in species identification. Further tests using real WMS data from an ATCC mock community confirmed its superior precision against sequencing depth. Additionally, by leveraging WMS data from an Inflammatory Bowel Disease (IBD) cohort, we demonstrated that the taxonomic features identified by MAP2B could better discriminate IBD from healthy controls and predict metabolomic profiles.

Our previous study demonstrated that the decoding of whole metagenomic sequencing data is also hindered by confusion surrounding the concept of sequence abundance versus taxonomic abundance (Sun et al., Nature Methods, 2021). Here, sequence abundance refers to the proportion of DNA, whereas taxonomic abundance refers to the proportion of individuals/cells. We showcased compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. While most metagenomic profilers, such as Bracken and Kraken, offer sequence abundance, taxonomic abundance may be a more clinically and ecologically relevant parameter. Hence, there is a pressing need for metagenomic profilers that can generate taxonomic abundance. Notably, MAP2B is one of the few in its ability to produce taxonomic abundance, setting it apart from existing tools. MAP2B can be accessed at: https://github.com/sunzhengCDNM/MAP2B.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Microbiology
Life Sciences > Biological Sciences > Microbiology

Related Collections

With collections, you can get published faster and increase your visibility.

Pre-clinical drug discovery

We welcome studies reporting advances in the discovery, characterization and application of compounds active on biologically or industrially relevant targets. Examples include emerging screening technologies, the development of small bioactive compounds/peptides/proteins, and the elucidation of compound structure-activity relationships, target interactions and mechanism-of-action.

Publishing Model: Open Access

Deadline: Mar 31, 2024

Biomedical applications for nanotechnologies

Overall, there are still several challenges on the path to the clinical translation of nanomedicines, and we aim to bridge this gap by inviting submissions of articles that demonstrate the translational potential of nanomedicines with promising pre-clinical data.

Publishing Model: Open Access

Deadline: Dec 31, 2023