Behind the Paper

Microbiome analysis of viruses is more accessible than ever

Published in Microbiology

Feb 21, 2022

Joachim Johansen

PhD fellow, NNFCPR

Liked by Massimo Tosco and 2 others

Where we started

The “viral” interest in virus genome binning started with the question: how well would our deep-learning based binner perform on other ecological domains in the human gut? I asked this question together with Jakob Nissen and Simon Rasmussen, the first and last authors on the Vamb paper (Nissen et al. 2021).

Back in 2019, we were curious if Vamb performed so well that a researcher would be able to process a huge metagenomic dataset (>=1000 samples) and retrieve high quality bacterial genomes (checkmark) but also other entities like bacteriophages. If this setup was possible, additional domains of the gut microbiome could be unlocked for downstream microbiome analysis. Initially, we probed how well our vamb bins, a set of contigs, resembled virus genomes by blasting them to the NCBI virus database composed of roughly 3000-8000 reference genomes, which were mostly eukaryotic viruses. These initial efforts revealed very few viral-bins but very high taxonomic consistency. Whenever one contig of a Vamb bin mapped to i.e. a Bacillus phage genome or a bell pepper phage genome (yes this is a real example) the remaining contigs also did, which sparked further incentive to benchmark and explore this in depth.

When the pieces came together

Together with our colleagues at the Copenhagen Prospective Studies of Asthma in Childhood (COPSAC) we started to benchmark VAMB’s performance on viral genomes based on 662 paired bulk metagenomic samples and viral-like particle (VLP) samples, which was the biggest dataset of its kind at the time. The COPSAC team established a golden/truth set of viral contigs discovered in the VLP dataset. With this golden standard of viruses available we launched a benchmark into how many of these could be recovered in Vamb bins from the bulk metagenomics samples. We were surprised to find that thousands of bins resembled golden standard viruses and a great portion could be retrieved in the bulk metagenomics data. Furthermore, we found that contigs of each bin mapped consistently to the same virus genome and typically contained few unrelated contigs. In order to make the identification of viral Vamb bins more accessible and less time consuming on huge bulk metagenomic datasets, we trained a Random forest (RF) model based on viral protein families and single-contig-prediction scores to identify putative viral Vamb bins. The great thing about dealing with bins of multiple contigs is that a majority-vote or consensus score can be derived to gain higher confidence in a given bins viral-"likeness". If a single virus contig did not achieve a high prediction score, the whole bin was not thrown in the trash as a result.

At the time, we did lack an external validation tool for mass-validation of viral Vamb bins. Fortunately the tool CheckV was put on biorxiv not long after by Nayfach et al, which added whole new facets to the benchmark and quality control. We could then group Vamb virus bins and those used as our golden standard viruses into different tiers of genome quality and completeness level. Most importantly, CheckV allowed us to conduct large scale virus evaluation of the RF predicted bins to a final subset of bona fide viruses. In essence, we could now establish the metavirome directly from bulk metagenomics.

Large scale binning of viruses and MAGs

To evaluate the methods' utility, we applied it to a massive public metagenomic dataset, the Human Microbiome Project 2 (HMP2), from which no virome characterisation had been described before. From HMP2 we mined thousands of High-quality (HQ) viruses and bacterial MAGs via binning, which could be used for further analysis into the bacterial and viral interplay during an agitated state like inflammation and severe dysbiosis. Here we identified 250 temperate viruses that expanded with increasing dysbiosis suggesting an inflammation driven prophage induction that could be aggravating the inflammatory state even further.

Furthermore, in all our benchmarks of the original Vamb paper, Vamb was superior for bacterial genome binning but also for separating highly similar strains from each other, even at 98–99.5% average nucleotide identity, thus eloquently dealing with complex biological diversity. Evidently, this was also the case for viral genomes like the crass-phage that represents a prevalent and abundant virus in the human gut. By overlaying our VAMB cluster labels to a crass-phage phylogenetic tree from the HMP2 dataset, we observed clear monophyletic clades of genomes corresponding to real diversity.

Concluding remarks

Viral binning is not a perfect computational process and results in many bins of fragmented/incomplete viruses or other mobile genetic elements like plasmids that might be mistaken for a virus. Even though Vamb driven binning is not 100% accurate it is pretty darn good at simultaneously handling binning of viruses with other entities like bacteria. We think that the ultimate binner should abe judged on its capacity to handle the presence of thousands of contigs from unrelated organisms at the same time, here Vamb does a really great job.

To prevent the inclusion of false-positive viruses in downstream analysis, we have gone great lengths to describe ways to handle the output of i.e. CheckV and also cutoffs to filter away contaminated viral bins. A very vital element of binning approaches on metagenomics is post-processing. We believe that careful validation is an element that cannot be ignored and do not recommend downstream analysis of metagenomic datasets without evaluating and classifying VAMB and PHAMB derived bins into confident biological units with dedicated tools. We hope that future evaluation tools can help with categorising the many viral-like bins that we also outlined in the manuscript and extend the downstream analysis beyond known viral diversity.

With these considerations in mind, we believe that our manuscript has sufficiently outlined the immense value of viral binning and the way it provides a greater foundation for future metagenomic analysis with focus on bacterial and viral ecology.

Additional information

Check out the paper here: https://www.nature.com/articles/s41467-022-28581-5

If you want to check out the HQ virus genomes uncovered from bulk metagenomics data such as HMP2 they can be downloaded here: https://zenodo.org/record/6200656#.YhN5XJPMIeY

AUTHOR

Joachim Johansen, Ph. D fellow, The Novo Nordisk Foundation Center of Protein Research (NNFCPR), Faculty of Health and Medical Sciences, University of Copenhagen, Denmark

Joachim Johansen

PhD fellow, NNFCPR

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Microbiology

Life Sciences > Biological Sciences > Microbiology

Microbiome

Microbiome

This journal hopes to integrate researchers with common scientific objectives across a broad cross-section of sub-disciplines within microbial ecology. It covers studies of microbiomes colonizing humans, animals, plants or the environment, both built and natural or manipulated, as in agriculture.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Harnessing plant microbiomes to improve performance and mechanistic understanding

This is a Cross-Journal Collection with Microbiome, Environmental Microbiome, npj Science of Plants, and npj Biofilms and Microbiomes. Please click here to see the collection page for npj Science of Plants and npj Biofilms and Microbiomes.

Modern agriculture needs to sustainably increase crop productivity while preserving ecosystem health. As soil degradation, climate variability, and diminishing input efficiency continue to threaten agricultural outputs, there is a pressing need to enhance plant performance through ecologically-sound strategies. In this context, plant-associated microbiomes represent a powerful, yet underexploited, resource to improve plant vigor, nutrient acquisition, stress resilience, and overall productivity.

The plant microbiome—comprising bacteria, fungi, and other microorganisms inhabiting the rhizosphere, endosphere, and phyllosphere—plays a fundamental role in shaping plant physiology and development. Increasing evidence demonstrates that beneficial microbes mediate key processes such as nutrient solubilization and uptake, hormonal regulation, photosynthetic efficiency, and systemic resistance to (a)biotic stresses. However, to fully harness these capabilities, a mechanistic understanding of the molecular dialogues and functional traits underpinning plant-microbe interactions is essential.

Recent advances in multi-omics technologies, synthetic biology, and high-throughput functional screening have accelerated our ability to dissect these interactions at molecular, cellular, and system levels. Yet, significant challenges remain in translating these mechanistic insights into robust microbiome-based applications for agriculture. Core knowledge gaps include identifying microbial functions that are conserved across environments and hosts, understanding the signaling networks and metabolic exchanges between partners, and predicting microbiome assembly and stability under field conditions.

This Research Topic welcomes Original Research, Reviews, Perspectives, and Meta-analyses that delve into the functional and mechanistic basis of plant-microbiome interactions. We are particularly interested in contributions that integrate molecular microbiology, systems biology, plant physiology, and computational modeling to unravel the mechanisms by which microbial communities enhance plant performance and/or mechanisms employed by plant hosts to assemble beneficial microbiomes. Studies ranging from controlled experimental systems to applied field trials are encouraged, especially those aiming to bridge the gap between fundamental understanding and translational outcomes such as microbial consortia, engineered strains, or microbiome-informed management practices.

Ultimately, this collection aims to advance our ability to rationally design and apply microbiome-based strategies by deepening our mechanistic insight into how plants select beneficial microbiomes and in turn how microbes shape plant health and productivity.

This collection is open for submissions from all authors on the condition that the manuscript falls within both the scope of the collection and the journal it is submitted to.

All submissions in this collection undergo the relevant journal’s standard peer review process. Similarly, all manuscripts authored by a Guest Editor(s) will be handled by the Editor-in-Chief of the relevant journal. As an open access publication, participating journals levy an article processing fee (Microbiome, Environmental Microbiome). We recognize that many key stakeholders may not have access to such resources and are committed to supporting participation in this issue wherever resources are a barrier. For more information about what support may be available, please visit OA funding and support, or email OAfundingpolicy@springernature.com or the Editor-in-Chief of the journal where the article is being submitted.

Collection policies for Microbiome and Environmental Microbiome:

Please refer to this page. Please only submit to one journal, but note authors have the option to transfer to another participating journal following the editors’ recommendation.

Collection policies for npj Science of Plants and npj Biofilms and Microbiomes:

Please refer to npj's Collection policies page for full details.

Publishing Model: Open Access

Deadline: Jun 01, 2026

Explore this Collection

Microbiome and Reproductive Health

Microbiome is calling for submissions to our Collection on Microbiome and Reproductive Health.

Our understanding of the intricate relationship between the microbiome and reproductive health holds profound translational implications for fertility, pregnancy, and reproductive disorders. To truly advance this field, it is essential to move beyond descriptive and associative studies and focus on mechanistic research that uncovers the functional underpinnings of the host–microbiome interface. Such studies can reveal how microbial communities influence reproductive physiology, including hormonal regulation, immune responses, and overall reproductive health.

Recent advances have highlighted the role of specific bacterial populations in both male and female fertility, as well as their impact on pregnancy outcomes. For example, the vaginal microbiome has been linked to preterm birth, while emerging evidence suggests that gut microbiota may modulate reproductive hormone levels. These insights underscore the need for research that explores how and why these microbial influences occur.

Looking ahead, the potential for breakthroughs is immense. Mechanistic studies have the power to drive the development of microbiome-based therapies that address infertility, improve pregnancy outcomes, and reduce the risk of reproductive diseases. Incorporating microbiome analysis into reproductive health assessments could transform clinical practice and, by deepening our understanding of host–microbiome mechanisms, lay the groundwork for personalized medicine in gynecology and obstetrics.

We invite researchers to contribute to this Special Collection on Microbiome and Reproductive Health. Submissions should emphasize functional and mechanistic insights into the host–microbiome relationship. Topics of interest include, but are not limited to:

- Microbiome and infertility

- Vaginal microbiome and pregnancy outcomes

- Gut microbiota and reproductive hormones

- Microbial influences on menstrual health

- Live biotherapeutics and reproductive health interventions

- Microbiome alterations as drivers of reproductive disorders

- Environmental factors shaping the microbiome

- Intergenerational microbiome transmission

This Collection supports and amplifies research related to SDG 3, Good Health and Well-Being.

All submissions in this collection undergo the journal’s standard peer review process. As an open access publication, this journal levies an article processing fee (details here). We recognize that many key stakeholders may not have access to such resources and are committed to supporting participation in this issue wherever resources are a barrier. For more information about what support may be available, please visit OA funding and support, or email OAfundingpolicy@springernature.com or the Editor-in-Chief.

Publishing Model: Open Access

Deadline: Jun 16, 2026

Explore this Collection

Using multi-omics to delve into Parkinson's disease gut microbiome

Behind the Paper

Prokaryotic-virus-encoded auxiliary metabolic genes throughout the global oceans

Editor Highlights from 2023

Behind the Paper

Antibiotic treatment also affects the intestinal fungal microbiota

Behind the Paper

Soil conditions and the plant microbiome boost the accumulation of monoterpenes in the fruit of Citrus reticulata ‘Chachi’

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Microbiome analysis of viruses is more accessible than ever

Share this post

Share with...

...or copy the link