Unlocking Reliable 16S rRNA Analysis: A Benchmarking Gold-Standard Ground Truth!

A validated mock community (235 strains, 197 species) offers a gold-standard ground truth for testing OTU/ASV methods. Unlike real data with unknown compositions, this resource ensures accurate pipeline evaluation. Dive into our open dataset (PRJNA975486) & analysis: https://rdcu.be/elVJj
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

BioMed Central
BioMed Central BioMed Central

The unresolved struggle of 16S rRNA amplicon sequencing: a benchmarking analysis of clustering and denoising methods - Environmental Microbiome

Background Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively. Results ASV algorithms—led by DADA2— resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms—led by UPARSE—achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity. Conclusion Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.

The analysis of 16S rRNA gene sequencing data involves several critical steps, including preprocessing, dereplication, chimera removal, and ultimately, clustering or denoising to infer biological sequences. To accurately assess the performance of each of these steps and ensure reliable results, the use of a complex mock community with a validated ground truth is essential for proper benchmarking. While large volumes of publicly available data exist and offer the advantage of being derived from real samples—unlike simulated data, which relies on prior assumptions—these datasets present a significant limitation: the true composition of the microbial communities is often unknown. This lack of a definitive ground truth poses a major challenge for comparative analyses, as it hampers our ability to rigorously evaluate the accuracy and effectiveness of clustering and denoising algorithms.

The mock community presented in this study comprises 235 bacterial strains representing 197 distinct species, providing a valuable and rigorous resource for the bioinformatics community. It offers an ideal framework for developers aiming to optimize their algorithms, as well as for analysts seeking to critically assess and benchmark existing 16S rRNA analysis pipelines. Notably, this same mock community has also been previously characterized at the shotgun metagenomic level by Gleb Goussarov, facilitating accurate metagenomic binning (see publication: https://link.springer.com/article/10.1186/s40793-022-00403-7). This dual availability at both amplicon and shotgun levels further enhances its utility as a comprehensive benchmarking standard for diverse microbial analysis workflow

In this study, we leveraged the complex mock community to conduct a head-to-head comparison of clustering and denoising approaches—specifically, Operational Taxonomic Units (OTUs) and Amplicon Sequence Variants (ASVs). This direct comparison allowed us to systematically highlight the strengths and limitations of each method. We believe that the robust design of our benchmarking framework, combined with the utilization of this complex mock community, provides a solid foundation for evaluating 16S rRNA analysis algorithms. Moreover, this framework offers a scalable model that could be extended to encompass entire pipeline comparisons in future studies.

Our comprehensive benchmarking framework, along with all datasets, detailed analyses, and key insights, is freely accessible https://environmentalmicrobiome.biomedcentral.com/articles/10.1186/s40793-025-00705-6.Additionally, the mock community dataset—available under accession number PRJNA975486—serves as a valuable resource for both bioinformatics algorithm development and rigorous performance evaluation.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Microbiome
Life Sciences > Biological Sciences > Microbiology > Microbial Communities > Microbiome
Environmental Microbiology
Life Sciences > Biological Sciences > Microbiology > Environmental Microbiology
Medical Microbiology
Life Sciences > Biological Sciences > Microbiology > Medical Microbiology
Microbiology Techniques
Life Sciences > Biological Sciences > Biological Techniques > Microbiology Techniques
Bioinformatics
Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology > Bioinformatics

Related Collections

With collections, you can get published faster and increase your visibility.

Adaptation of microbial communities to climate change

Environmental Microbiome is calling for submissions to our Collection on 'Adaptation of microbial communities to climate change.'

Climate change is one of the most critical challenges of our time, with far-reaching impacts on ecosystems and biodiversity. Microbial communities play a pivotal role in maintaining ecological balance and ecosystem function. This collection will explore how microbial communities respond, evolve, and adapt to changing climatic conditions. Topics of interest include but are not limited to, the impact of rising temperatures, altered precipitation patterns, and extreme weather events of microbial diversity, distribution, and metabolic processes. Additionally, studies investigating the role of microbial communities in mediating climate change effects such as carbon cycling, greenhouse gas emissions, and biogeochemical interactions are also welcome.

By bringing together cutting-edge research in this field, this collection aims to enhance our understanding of the intricate relationships between microbial communities and climate change, providing invaluable insights for conservation and management of ecosystems under unprecedented environmental challenges.

All submissions in this collection undergo the journal’s standard peer review process. Similarly, all manuscripts authored by a Guest Editor(s) will be handled by the Editor-in-Chief. As an open access publication, this journal levies an article processing fee (details here). We recognize that many key stakeholders may not have access to such resources and are committed to supporting participation in this issue wherever resources are a barrier. For more information about what support may be available, please visit OA funding and support, or email OAfundingpolicy@springernature.com or the Editor-in-Chief.

Publishing Model: Open Access

Deadline: Dec 31, 2025