Discovering dominant viral variants and their implications for host-protein binding.

In order for viruses to persist, flexibility is key. Flexible regions significantly influence the binding behavior of proteins. We investigated a highly flexible region in the SARS-CoV-2 receptor binding domain (RBD) to determine its effect on human angiotensin-converting enzyme 2 (hACE2) binding.
Discovering dominant viral variants and their implications for host-protein binding.
Like

Protein flexibility is very crucial for the virus to “negotiate” with the host 

Proteins are known to be highly flexible yet efficient workers. For example, a protein can be a strict professional recruiter for a specific molecule like water (aquaporins) or could be a highly adaptable one as well, especially when it comes to immune response (antibodies).

Mechanically, proteins are very specialised in their ability to adopt differential flexibility across their 3D structures. Most often, the interim flexibility in a part of the protein 3D structure governs critical biological functions, like in the case of virus invasion. Here the structural proteins on the virus’s outer surface try to interact with the host cell. At the core level, this process is protein-protein interactions, where a viral protein tries to negotiate with the host cell for a safer passage inside a cell. Both a viral protein and the respective host protein are new to each other; hence they extensively use their flexible arms to interact with each other. Like in the case of SARS-CoV-2, the virus has actively deployed its spike protein to negotiate with the human angiotensin-converting enzyme 2 (hACE2) receptor proteins, primarily present on the alveolar cell lines of the lungs.



Digging out the most flexible spot in SARS-CoV-2’s “arms” 

During the onset of the SARS-CoV-2 pandemic, we constantly monitored the emerging amino-acid changes in spike protein sequences. We aimed to discover the most frequent amino-acid exchanges in spike protein, especially inside the receptor binding domain (RBD), which is spike protein’s most exposed part to the hACE2 receptor. Analyzing spike sequences available on the GISAID database, we tracked down S477, the most variable position in RBD then. Driven by curiosity, we filtered the SARS-CoV-2 variants spreading in Austria. To our surprise, S477G (an abbreviation for substituting serine residue at the 477 position with a much smaller glycine residue) was among Austria’s leading SARS-CoV-2 sequences collected from infected individuals. On the contrary, S477N was the most circulated variant of native S477 across the globe. To understand these two particular S477 variants of SARS-CoV-2, we first performed a flexibility analysis, showing that S477 structurally constitutes the most flexible part of RBD, which could directly influence hACE2:RBD crosstalk during viral membrane fusion with the human cell membrane.

Structure of the RBD in a ‘sausage-style’ representation with the thickness of the tube indicating the magnitude of the fluctuations.

 

Molecular dynamics simulations are indispensable

The onset course of the SARS-CoV-2 pandemic was a novel and highly demanding situation for structural biologists. In such challenging times, computational techniques have become indispensable. Therefore, we opt for molecular dynamics (MD) simulations to understand the significance of S477 variants in the context of the hACE2:RBD interactions. Various other factors lead to preferring MD simulations, including but not limited to its ability to run purely on chips, reproducibility and sharability, minimal labor requirements, its ability to adopt simplistic modular solutions to complex biomolecular problems, and most importantly, the minimum safety and ethical constraints. By nature, MD simulations are physics-based techniques often employed to comprehend biomolecular dynamics at the atomic level. For example, the ingrained crosstalks between the atomic species at protein-protein or protein-drug interaction interfaces can be characterized by MD simulations. It’s worth mentioning that for the first US FDA-approved HIV integrase inhibitor, MD simulations reveal a hidden druggable site [1] on HIV integrase that was further utilized by Merck & Co. to develop the antiretroviral drug Raltegravir [2], which is among one of the examples that signify the importance of protein flexibility characterized by MD simulations. 



Leveraging the power of MD simulations to reveal the significance of S477 and its “pro” variants for host protein interactions

With limited access to laboratories during the pandemic, we were unconditionally allowed to utilize our high-performance supercomputer facility at Innophore and Vienna Scientific Clusters (VSC) to gain more insight into S477 variants using MD simulations. We analyzed the impact of S477G and S477N substitutions on hACE2:RBD interactions using “footprint” analysis of MD simulations; both variants clearly stand out, as depicted in the figure below. S477N has shown the highest contact with hACE2, followed by S477G and the native S477.

Volumetric map analysis of 100 ns trajectory showing the hACE2 residues within 5 Å of RBD. Native RBD and the variants S477G and S477N are shown as grey cartoon representations with residue 477 highlighted in yellow.

Furthermore, we have employed umbrella sampling, an advanced MD simulation technique often used to predict a particular protein-protein interaction’s strength. Our detailed structural analysis demonstrated that the compared to native S477, S447G, and S477N variants would strengthen the SARS-CoV-2’s spike protein binding with hACE2. Our findings have been followed by experimental validation [3,4].


Our structural advice became highly “transmissible” 

After that, S477N emerged as an integral part of various SARS-CoV-2 lineages. Among them, the B.1.526 lineage has a 35% more transmissible rate than the non-variant virus, resulted in a sharp increase in infections from late 2020, and remained dominant from March to May 2021 [5]. Particularly in New York, the S477N variant was spreading at an alarming rate [6] and subsequently became part of the “super-spreader” Omicron variant [7]. Both of our simulated variants are also reported to be highly resistant to antibodies [4]. Till today (08.03.2023), out of 15,138,488 sequences deposited on GISAID, S477N (7,234,065) and S477G (1337) combinedly appears in around half of them.

 

Our suggested first line of approach for tackling the emergence of virus variants

  1. Sequence Collection: Collect deposited sequences from authorized and legitimate viral databases like GISAID.
  2. Filtering the dominating variants: Perform frequency-based sequence analysis like Shannon Entory or Surprisal Index to monitor the position-specific changes across viral proteins.
  3. Flexibility Mapping: Perform Normal Mode Analysis (NMA) of the target viral protein, followed by the structural mapping of dominating variants. This step would reveal the most potent and flexible variants for host protein interactions.
  4. Binding Energy Prediction: Using advanced MD simulations like Umbrella Sampling, assess the effect of dominating variants on host protein binding. 




About the companies behind the paper

Based in Austria and San Francisco, Innophore is a high-tech spin-off, specializing in the fields of digital drug discovery and enzyme search [8] using 3D point clouds - Catalophores, AI and Deep Learning. Innophore’s vision is to identify and develop high-value industrial and therapeutic enzymes and more efficient, environmentally friendly ‘green’ chemical production processes and novel biosimilars for medical treatments, including contributions to drug repurposing [9], analyses of virus mutational dynamics [10], finding new inhibitors [11], and side-effect prediction using our 3D point-cloud technology. 


References:

[1] Schames, J. R. et al. Discovery of a Novel Binding Trench in HIV Integrase. J. Med. Chem. 47, 1879–1881 (2004).

[2] Hazuda, D. J. et al. A naphthyridine carboxamide provides evidence for discordant resistance between mechanistically identical inhibitors of HIV-1 integrase. Proc. Natl. Acad. Sci. 101, 11233–11238 (2004).

[3] Wang, R. et al. Analysis of SARS-CoV-2 variant mutations reveals neutralization escape mechanisms and the ability to use ACE2 receptors from additional species. Immunity 54, 1611-1621.e5 (2021).

[4] Liu, Z. et al. Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization. Cell Host Microbe 29, 477-488.e4 (2021).

[5] Annavajhala, M. K. et al. Emergence and expansion of SARS-CoV-2 B.1.526 after identification in New York. Nature 597, 703–708 (2021).

[6] West, A. P. et al. Detection and characterization of the SARS-CoV-2 lineage B.1.526 in New York. Nat. Commun. 12, 4886 (2021).

[7] Callaway, E. & Ledford, H. How bad is Omicron? What scientists know so far. Nature 600, 197–199 (2021).

[8] G. Steinkellner et al.: Identification of promiscuous ene-reductase activity by mining structural databases using active site constellations, Nature Communications volume 5, Article number: 4150 (2014).

[9] C. Gorgulla et al.: A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening, iScience 24, 102021 (2021).

[10] L. Parigger et al.: Recent changes in the mutational dynamics of the SARS-CoV-2 main protease substantiate the danger of emerging resistance to antiviral drugs, Front. Med. 9:1061142 (2022).

[11] M. Prattes et al.: Structural basis for inhibition of the AAA-ATPase Drg1 by diazaborine, Nature Communications volume 12, Article number: 3483 (2021).

 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Health Care
Life Sciences > Health Sciences > Health Care

Related Collections

With collections, you can get published faster and increase your visibility.

The psychology of sleep

This Guest Edited Collection explores the reciprocal relationship between sleep and mental health, as well as the diagnosis and treatment of sleep disorders.

Publishing Model: Open Access

Deadline: Dec 31, 2023

Pharmacogenomics for precision medicine

This Collection presents recently identified genomic pathways, novel genetic biomarkers, and epigenetic signals influencing treatment outcomes for particular patient populations.

Publishing Model: Open Access

Deadline: Ongoing