Microbiome analysis of viruses is more accessible than ever

Published in Microbiology

Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Where we started

The “viral” interest in virus genome binning started with the question: how well would our deep-learning based binner perform on other ecological domains in the human gut? I asked this question together with Jakob Nissen and Simon Rasmussen, the first and last authors on the Vamb paper (Nissen et al. 2021).

Back in 2019, we were curious if Vamb performed so well that a researcher would be able to process a huge metagenomic dataset (>=1000 samples) and retrieve high quality bacterial genomes (checkmark) but also other entities like bacteriophages. If this setup was possible, additional domains of the gut microbiome could be unlocked for downstream microbiome analysis. Initially, we probed how well our vamb bins, a set of contigs, resembled virus genomes by blasting them to the NCBI virus database composed of roughly 3000-8000 reference genomes, which were mostly eukaryotic viruses. These initial efforts revealed very few viral-bins but very high taxonomic consistency. Whenever one contig of a Vamb bin mapped to i.e. a Bacillus phage genome or a bell pepper phage genome (yes this is a real example) the remaining contigs also did, which sparked further incentive to benchmark and explore this in depth. 

When the pieces came together

Together with our colleagues at the Copenhagen Prospective Studies of Asthma in Childhood (COPSAC) we started to benchmark VAMB’s performance on viral genomes based on 662 paired bulk metagenomic samples and viral-like particle (VLP) samples, which was the biggest dataset of its kind at the time. The COPSAC team established a golden/truth set of viral contigs discovered in the VLP dataset. With this golden standard of viruses available we launched a benchmark into how many of these could be recovered in Vamb bins from the bulk metagenomics samples. We were surprised to find that thousands of bins resembled golden standard viruses and a great portion could be retrieved in the bulk metagenomics data. Furthermore, we found that contigs of each bin mapped consistently to the same virus genome and typically contained few unrelated contigs. In order to make the identification of viral Vamb bins more accessible and less time consuming on huge bulk metagenomic datasets, we trained a Random forest (RF) model based on viral protein families and single-contig-prediction scores to identify putative viral Vamb bins. The great thing about dealing with bins of multiple contigs is that a majority-vote or consensus score can be derived to gain higher confidence in a given bins viral-"likeness". If a single virus contig did not achieve a high prediction score, the whole bin was not thrown in the trash as a result.

At the time, we did lack an external validation tool for mass-validation of viral Vamb bins. Fortunately the tool CheckV was put on biorxiv not long after by Nayfach et al, which added whole new facets to the benchmark and quality control. We could then group Vamb virus bins and those used as our golden standard viruses into different tiers of genome quality and completeness level. Most importantly, CheckV allowed us to conduct large scale virus evaluation of the RF predicted bins to a final subset of bona fide viruses. In essence, we could now establish the metavirome directly from bulk metagenomics.

Large scale binning of viruses and MAGs

To evaluate the methods' utility, we applied it to a massive public metagenomic dataset, the Human Microbiome Project 2 (HMP2), from which no virome characterisation had been described before. From HMP2 we mined thousands of High-quality (HQ) viruses and bacterial MAGs via binning, which could be used for further analysis into the bacterial and viral interplay during an agitated state like inflammation and severe dysbiosis. Here we identified 250 temperate viruses that expanded with increasing dysbiosis suggesting an inflammation driven prophage induction that could be aggravating the inflammatory state even further.

Furthermore, in all our benchmarks of the original Vamb paper, Vamb was superior for bacterial genome binning but also for separating highly similar strains from each other, even at 98–99.5% average nucleotide identity, thus eloquently dealing with complex biological diversity. Evidently, this was also the case for viral genomes like the crass-phage that represents a prevalent and abundant virus in the human gut. By overlaying our VAMB cluster labels to a crass-phage phylogenetic tree from the HMP2 dataset, we observed clear monophyletic clades of genomes corresponding to real diversity.

Concluding remarks

Viral binning is not a perfect computational process and results in many bins of fragmented/incomplete viruses or other mobile genetic elements like plasmids that might be mistaken for a virus. Even though Vamb driven binning is not 100% accurate it is pretty darn good at simultaneously handling binning of viruses with other entities like bacteria. We think that the ultimate binner should abe judged on its capacity to handle the presence of thousands of contigs from unrelated organisms at the same time, here Vamb does a really great job.

To prevent the inclusion of false-positive viruses in downstream analysis, we have gone great lengths to describe ways to handle the output of i.e. CheckV and also cutoffs to filter away contaminated viral bins. A very vital element of binning approaches on metagenomics is post-processing. We believe that careful validation is an element that cannot be ignored and do not recommend downstream analysis of metagenomic datasets without evaluating and classifying VAMB and PHAMB derived bins into confident biological units with dedicated tools. We hope that future evaluation tools can help with categorising the many viral-like bins that we also outlined in the manuscript and extend the downstream analysis beyond known viral diversity.

With these considerations in mind, we believe that our manuscript has sufficiently outlined the immense value of viral binning and the way it provides a greater foundation for future metagenomic analysis with focus on bacterial and viral ecology.

Additional information

Check out the paper here: https://www.nature.com/articles/s41467-022-28581-5 

If you want to check out the HQ virus genomes uncovered from bulk metagenomics data such as HMP2 they can be downloaded here: https://zenodo.org/record/6200656#.YhN5XJPMIeY

AUTHOR

Joachim Johansen, Ph. D fellow, The Novo Nordisk Foundation Center of Protein Research (NNFCPR), Faculty of Health and Medical Sciences, University of Copenhagen, Denmark

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Microbiology
Life Sciences > Biological Sciences > Microbiology
  • Microbiome Microbiome

    This journal hopes to integrate researchers with common scientific objectives across a broad cross-section of sub-disciplines within microbial ecology. It covers studies of microbiomes colonizing humans, animals, plants or the environment, both built and natural or manipulated, as in agriculture.

Related Collections

With Collections, you can get published faster and increase your visibility.

Animal Gut Nutrition and Greenhouse Gas Mitigation

Animal Microbiome, Journal of Animal Science and Biotechnology and Microbiome call for submissions to the collection on Animal Gut Nutrition and Greenhouse Gas Mitigation.

Efforts to reduce greenhouse gas emissions from livestock systems increasingly hinge on innovations in animal gut nutrition. The dynamic relationship between the gut microbiome and nutrient utilization plays a pivotal role in shaping methane output, feed efficiency, and overall sustainability. Advances in microbial ecology—particularly in understanding the role of gut microbiome in nutrient metabolism—are opening new pathways for mitigating emissions while enhancing productivity. These developments support the implementation of climate-smart agricultural strategies to address climate change and its impacts.

Looking ahead, continued research in this field has the potential to yield innovative solutions such as targeted probiotic supplementation, which could further optimize gut function and enhance nutrient absorption. These advancements may lead to reduced greenhouse gas emissions while improving animal health and productivity. By deepening our understanding of the animal gut microbiome, we can contribute significantly to sustainable agricultural practices that benefit both the environment and food security.

We invite researchers to contribute to this special Collection on Animal Gut Nutrition and Greenhouse Gas Mitigation. Topics of interest include but are not limited to:

- Animal Gut Microbiome and Feed Efficiency

- Greenhouse Gas Mitigation Strategies

- Rumen Fermentation Dynamics

- Nutrient Utilization in Livestock

- Probiotic Supplementation Effects

- Sustainable Livestock Production Practices

- Climate-Smart Agriculture Innovations

This Collection supports and amplifies research related to SDG 13, Climate action.

All submissions in this collection undergo the relevant journal’s standard peer review process. Similarly, all manuscripts authored by a Guest Editor(s) will be handled by the Editor-in-Chief of the relevant journal. As an open access publication, participating journals levy an article processing fee (Animal Microbiome fees, Journal of Animal Science and Biotechnology fees, Microbiome fees). We recognize that many key stakeholders may not have access to such resources and are committed to supporting participation in this issue wherever resources are a barrier. For more information about what support may be available, please visit OA funding and support, or email OAfundingpolicy@springernature.com or the Editor-in-Chief of the journal where the article is being submitted.

Publishing Model: Open Access

Deadline: Sep 04, 2026

The Apple Microbiome

Microbiome and Environmental Microbiome are calling for submissions to our Collection on the Apple Microbiome.

With world apple production estimated at 84 million tons, the microbiome of the apple has significant implications for agriculture, food security, and human health. Understanding the complex interactions between apple plants and their associated microbial communities can lead to improved crop management strategies, enhanced fruit quality and longevity, and sustainable agricultural practices. Recent advances have highlighted the role of specific bacteria and fungi in promoting plant health and resilience against specific pathogens. Moreover, detailed profiling of these microbial communities, revealing their diversity and functional potential facilitate exciting future developments, such as the identification of beneficial microbial consortia for biocontrol and the formulation of tailored probiotic treatments for both plants and humans. By advancing our collective understanding in this area, we can work towards a more sustainable and resilient agricultural system.

Topics of interest include but are not limited to:

-Microbial diversity and function associated with apples

-Effects of soil health and rhizosphere interactions on apple production

-Impact of climate change on the apple microbiome

-Role of the apple microbiome in fruit quality

-Microbiome-driven strategies for disease resistance

This collection is open for submissions from all authors on the condition that the manuscript falls within both the scope of the collection and the journal it is submitted to.

All submissions in this collection undergo the relevant journal’s standard peer review process. Similarly, all manuscripts authored by a Guest Editor(s) will be handled by the Editor-in-Chief of the relevant journal. As an open access publication, participating journals levy an article processing fee (Microbiome, Environmental Microbiome). We recognize that many key stakeholders may not have access to such resources and are committed to supporting participation in this issue wherever resources are a barrier. For more information about what support may be available, please visit OA funding and support, or email OAfundingpolicy@springernature.com or the Editor-in-Chief of the journal where the article is being submitted.

Publishing Model: Open Access

Deadline: Aug 05, 2026