Making sense of the virome with functional viromics

Although deep-sequencing has uncovered thousands of animal virus sequences in nature, we still do not have an effective way to study all of these viruses in the laboratory. Our latest research outlines one approach we have taken to functionally studying the ever-growing virome.
Making sense of the virome with functional viromics

In 2002, an epidemic of respiratory illness began that would go on to result in more than 8000 cases in 26 countries with a case fatality rate of nearly 10%. Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) was quickly identified as the causative agent and genetic analyses revealed that the virus had likely emerged from bats into palm civets and then into people. The SARS epidemic occurred around the same time that next generation sequencing technologies became available, allowing for rapid, deep sequencing of genetic material. Suddenly, researchers were able to investigate the genetic sequences of every virus from an animal-derived sample without first isolating live virus. These developments spurred the rapid identification of a multitude of previously uncharacterized viruses circulating in the life around us - the "virome". In the two decades since, researchers have amassed an impressive collection of animal-associated virus genome sequences in publicly-accessible databases including Genbank and GISAID. Some of these viruses have no relation to viruses that infect people, while others are quite similar at the genetic level to known human pathogens. While these sequencing efforts have revealed the incredible diversity of viruses on Earth, without live isolates we still know very little about the biology of these viruses or which viruses may have the capacity to cause outbreaks in humans.

A striking number of novel animal viruses discovered in recent years are closely related to SARS-CoV. Currently, there are over 40,000 results for the search term "coronavirus" on Genbank. A diverse array of coronaviruses have been found throughout the world and in most animals sampled. While many coronaviruses have been discovered in bats, coronavirus sequences have now been reported in hedgehogs, whales, chickens, pigs, and seals; pick an animal and it probably has a coronavirus. 

With rapid land-use change and increasing trade in livestock and wildlife, we have seen a rise of coronavirus spillover events that have led to at least four separate and severe outbreaks: SARS-CoV in 2003, MERS-CoV in 2012, SADS-CoV in 2018 (a porcine virus that had a profound effect on the global pig industry), and now SARS-CoV-2 (aka 2019-nCoV, aka COVID-19 CoV) in 2020. With each new coronavirus outbreak, a number of related coronaviruses are identified, post-hoc, in the genetic databases - turns out we have seen this genome or something similar before. 

So why is every new coronavirus a surprise? 

Because we actually have not studied 99.9% of these viruses in the laboratory. Therefore, we do not know if these viruses that we are constantly discovering can cross the species barrier to humans, if they can transmit to our livestock, or if they can cause disease. 

Over the past two years, our group has been developing an approach to gain traction on the lack of functional data for the virome. We started with the coronaviruses because we know they are widely distributed and several have crossed species barriers. Within the coronaviruses, we selected a group for which we already have years of critical foundational data on protein structure and function: lineage B betacoronaviruses - the group that includes SARS-CoV and now, SARS-CoV-2. There are sequences for hundreds of lineage B betacoroanviruses in the databases. Up until now, under a dozen of them have been studied in the laboratory.

We devised a platform to test the smallest region of the coronavirus genome known to play a major role in zoonosis: the viral spike protein. Spike is found on the surface of virus particles and facilitates viral invasion of the host cell - one of the first steps to infecting a host. Within the spike protein, only a very small stretch of amino acids actually come into contact with the receptor on the host cell. Our platform is based on this portion of spike: the receptor binding domain (RBD). Using standard molecular cloning techniques, we synthesized DNA encoding the RBDs from different but related viruses and then swapped them in place of the RBD of SARS-CoV spike. We then tested the ability of these chimeric spike proteins to mediate cell invasion using a safe, non-replicating virus that produces green fluorescent protein (GFP) and luciferase upon infecting a cell. 

By replacing just the receptor binding domain (RBD) of the coronavirus spike protein, we were able to reduce the cost and time needed to test multiple related viruses for their ability to infect human cells and use known human receptors. 

While none of the technologies that comprise our platform are necessarily "cutting-edge," our approach as a whole is unique, and allowed us to test which lineage B betacoronaviruses could infect human cells and use known human receptors. Critically, because we only test a small region of the virus, the cost and speed of the assay is conducive to scaling up and testing large numbers of viruses. Indeed, we tested every published variant of the RBD for lineage B betacoronviruses for under the cost of synthesizing two whole spike genes. 

 Just as we were writing the manuscript earlier this year, a new lineage B coronavirus emerged in China. Our platform allowed us to functionally test this new virus with unprecedented speed - moving from the published sequence to having laboratory data confirming the host receptor in just 12 days. When we released our manuscript on a public pre-print server, there were only 2 other manuscripts, with ours providing the first laboratory-based evidence showing something about the biology of the new pathogen. Time is critical in virus outbreaks, especially in the early days when data is scarce. 

Our approach is just the beginning of this type of work: functional viromics. Viruses form hundreds of protein interactions with the host during the course of infection and presumably, many of these interactions can also form species barriers. Thus, we are expanding this methodology to include different types of viruses and interactions. 

The ultimate goal of this work is to move away from the rush to learn basic virus biology after virus spill-over and emergence: with enough functional data for the virome, we will already know which viruses possess zoonotic risk and will have a stronger understanding of how these viruses infect their hosts.


Check out our full paper at:

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in