After the Paper

Real-time Evolution: Monitoring SARS-CoV-2 Mutations via the PED Algorithm

Published in Microbiology, Genetics & Genomics, and Biomedical Research

Apr 09, 2026

Akio Miyao and Ilkka Havukkala

2 contributors

Real-time Evolution: Monitoring SARS-CoV-2 Mutations via the PED Algorithm

Liked by Alice Taylor

Explore the Research

The novel coronavirus (Severe Acute Respiratory Syndrome Coronavirus 2, SARS‑CoV‑2), first identified in Wuhan at the end of 2019, spread rapidly worldwide by January 2020. In Japan, infections were initially reported among people who had dined together on traditional houseboats and among cruise ship passengers, eventually developing into a full-scale pandemic.

SARS‑CoV‑2 shares similarities with the earlier SARS‑CoV outbreak; however, the amino acid sequence of the spike (S) protein responsible for receptor binding differs substantially. Although both viruses use ACE2 as the cellular entry receptor, the mode of interaction with ACE2 is different in SARS-CoV-2. This resulted in markedly increased binding affinity and infection efficiency for SARS‑CoV‑2.

The more stable binding to the ACE2 protein enabled efficient early replication in airway epithelial cells, leading to extremely high transmissibility. In addition, the high frequency of asymptomatic infections meant that infected individuals often continued normal activities, facilitating explosive global spread.

Although the case fatality rate of SARS‑CoV‑2 is considered lower than that of SARS‑CoV, its high transmissibility led to a massive number of infections. As a result, many people—particularly the elderly and those with compromised immune systems—lost their lives.

The nucleotide sequence of SARS‑CoV‑2 was released at a very early stage by Chinese researchers. A distinctive feature for research on this pandemic was the rapid and wide public availability of next‑generation sequencing (NGS) data derived directly from patient samples. In particular, the United Kingdom analysed sequencing data from a very large number of patients, which became available for download from the NCBI Sequence Read Archive (SRA).

We analysed these downloaded sequences using our virus-tailored modification of PED program to detect genetic variants. By applying PED’s bidirectional alignment method, we were able to efficiently detect both single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) in various sequenced virus genome samples.

PED includes a function that determines homozygous and heterozygous states based on the frequency of detected variant candidates. However, this framework is not appropriate for SARS‑CoV‑2, which is not a diploid organism. In practice, SARS‑CoV‑2 samples often contained mixed infections of multiple variants, and the observed variant frequencies varied widely among mutations.

To address this, we extended PED by adding a function that outputs read counts for each detected variant, specifically targeting organisms like SARS‑CoV‑2 that do not exhibit homozygous or heterozygous genotypes. This enhancement enabled more appropriate confirmation of viral mutations.

In our study, we downloaded virus genome sequences form ca. 50 individuals per sampling date and detected polymorphisms by our modified PED. During the early stages of the pandemic. As there were many days with fewer than 50 reported, we analysed all available data for those days.

The figure shows an example of the detection frequency of mutations plotted at monthly intervals from the start of the pandemic. From 2020 until around May 2021, the Alpha variant carrying the N501Y (A23063T) mutation was dominant, after which it was replaced by the Delta variant carrying the L452R (T22916G) mutation. We also obtained extensive information on insertion and deletion mutations, which are a particular strength of PED. Although PED was originally developed primarily for detecting polymorphisms in eukaryotic organisms, it produced results for viral variants that were largely consistent with previously reported findings. When NGS data were available, results could be obtained within just a few minutes.

In this way, we were able to observe the evolution of SARS‑CoV‑2 during the pandemic almost in real time. However, eventually we suspended this monitoring analysis because SRA later limited downloads of the metadata for sample collection dates to approximately 100,000 records, preventing access to more recent data.

The ability to analyse mutations directly from publicly available raw sequencing data—without waiting for expert consensus analyses—was a highly significant development made evident during this pandemic.

https://akiomiyao.github.io/ped/covid19/index.html

Multiple Contributors

Akio Miyao and Ilkka Havukkala

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

COVID19

Life Sciences > Biological Sciences > Microbiology > Medical Microbiology > Infectious Diseases > COVID19

Genetics Research

Life Sciences > Health Sciences > Biomedical Research > Genetics Research

Medical Genetics

Life Sciences > Biological Sciences > Genetics and Genomics > Medical Genetics

BMC Bioinformatics

BMC Bioinformatics

This is an open access, peer-reviewed journal that considers articles describing novel computational algorithms and software, models and tools, including statistical methods, machine learning and artificial intelligence, as well as systems biology.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Predictive toxicology

BMC Bioinformatics is welcoming submissions to our Collection on Predictive toxicology.

Predictive toxicology investigates the harmful effects of chemical substances using models and data-driven methods, often aiming to decrease dependence on traditional animal testing, such as mammals, for assessing health risks. Developments in this field support New Approach Methodologies (NAMs) for evaluating chemical safety and regulation. NAMs refer to any methods that enhance safety assessments while avoiding animal testing. Specifically, predictive toxicology employs computational techniques with a mechanistic understanding of toxicity to estimate risks to human health and the environment.

Recent advances have highlighted the use of various technologies that generate data valuable for in silico toxicity prediction, including omics, in vitro screening, high-throughput phenotyping, organoids, and alternative in vivo models. These innovations, combined with comparative biology and insights from other disciplines (e.g., genetics, evolution), refine hazard and risk assessment methods, facilitating a more precise evaluation of chemical safety and ultimately improving health outcomes.

This Collection welcomes submissions on the development of new computational and/or statistical approaches for predictive toxicology.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer-review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Aug 14, 2026

Explore this Collection

Cell tracking

BMC Bioinformatics is welcoming submissions to our Collection on Cell Tracking.

Cell tracking is a technique used to monitor and analyze the movement and behavior of cells over time, allowing the study of cellular behaviors, dynamics, and interactions within various biological contexts. Advanced bioinformatics tools play a vital role in analyzing cell tracking data. They help identify cell movement patterns and understand their biological implications. These tools are particularly relevant when processing large datasets and when investigating cell cycles.

This Collection welcomes submissions on the development of new computational and/or statistical approaches for cell tracking. We encourage contributions detailing methods for detecting and characterizing cell movements to better understand cell migration and behavior.

Publishing Model: Open Access

Deadline: Jun 23, 2026

Explore this Collection

Detecting Transposable Element Insertions from NGS Data

Behind the Paper

Capturing the Edge of Polymorphisms: A Reference-Free Variant Detection Method

Behind the Paper

Full Spectrum CRISPR Analysis: Rapidly Verify On-Target and Off-Target Edits with PED

Behind the Paper

Looking at it from both directions

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Real-time Evolution: Monitoring SARS-CoV-2 Mutations via the PED Algorithm

Share this post

Share with...

...or copy the link