Real-time Evolution: Monitoring SARS-CoV-2 Mutations via the PED Algorithm

Real-time Evolution: Monitoring SARS-CoV-2 Mutations via the PED Algorithm
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

BioMed Central
BioMed Central BioMed Central

Polymorphic edge detection (PED): two efficient methods of polymorphism detection from next-generation sequencing data - BMC Bioinformatics

Background Accurate detection of polymorphisms with a next generation sequencer data is an important element of current genetic analysis. However, there is still no detection pipeline that is completely reliable. Result We demonstrate two new detection methods of polymorphisms focusing on the Polymorphic Edge (PED). In the matching between two homologous sequences, the first mismatched base to appear is the SNP, or the edge of the structural variation. The first method is based on k-mers from short reads and detects polymorphic edges with k-mers for which there is no match between target and control, making it possible to detect SNPs by direct comparison of short-reads in two datasets (target and control) without a reference genome sequence. The second method is based on bidirectional alignment to detect polymorphic edges, not only SNPs but also insertions, deletions, inversions and translocations. Using these two methods, we succeed in making a high-quality comparison map between rice cultivars showing good match to the theoretical value of introgression, and in detecting specific large deletions across cultivars. Conclusions Using Polymorphic Edge Detection (PED), the k-mer method is able to detect SNPs by direct comparison of short-reads in two datasets without genomic alignment step, and the bidirectional alignment method is able to detect SNPs and structural variations from even single-end short-reads. The PED is an efficient tool to obtain accurate data for both SNPs and structural variations. Availability The PED software is available at: https://github.com/akiomiyao/ped .

The novel coronavirus (Severe Acute Respiratory Syndrome Coronavirus 2, SARS‑CoV‑2), first identified in Wuhan at the end of 2019, spread rapidly worldwide by January 2020. In Japan, infections were initially reported among people who had dined together on traditional houseboats and among cruise ship passengers, eventually developing into a full-scale pandemic.

SARS‑CoV‑2 shares similarities with the earlier SARS‑CoV outbreak; however, the amino acid sequence of the spike (S) protein responsible for receptor binding differs substantially. Although both viruses use ACE2 as the cellular entry receptor, the mode of interaction with ACE2 is different in SARS-CoV-2. This resulted in markedly increased binding affinity and infection efficiency for SARS‑CoV‑2.

The more stable binding to the ACE2 protein enabled efficient early replication in airway epithelial cells, leading to extremely high transmissibility. In addition, the high frequency of asymptomatic infections meant that infected individuals often continued normal activities, facilitating explosive global spread.

Although the case fatality rate of SARS‑CoV‑2 is considered lower than that of SARS‑CoV, its high transmissibility led to a massive number of infections. As a result, many people—particularly the elderly and those with compromised immune systems—lost their lives.

The nucleotide sequence of SARS‑CoV‑2 was released at a very early stage by Chinese researchers. A distinctive feature for research on this pandemic was the rapid and wide public availability of next‑generation sequencing (NGS) data derived directly from patient samples. In particular, the United Kingdom analysed sequencing data from a very large number of patients, which became available for download from the NCBI Sequence Read Archive (SRA).

We analysed these downloaded sequences using our virus-tailored modification of PED program to detect genetic variants. By applying PED’s bidirectional alignment method, we were able to efficiently detect both single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) in various sequenced virus genome samples.

PED includes a function that determines homozygous and heterozygous states based on the frequency of detected variant candidates. However, this framework is not appropriate for SARS‑CoV‑2, which is not a diploid organism. In practice, SARS‑CoV‑2 samples often contained mixed infections of multiple variants, and the observed variant frequencies varied widely among mutations.

To address this, we extended PED by adding a function that outputs read counts for each detected variant, specifically targeting organisms like SARS‑CoV‑2 that do not exhibit homozygous or heterozygous genotypes. This enhancement enabled more appropriate confirmation of viral mutations.

In our study, we downloaded virus genome sequences form ca. 50 individuals per sampling date and detected polymorphisms by  our modified PED. During the early stages of the pandemic. As there were many days with fewer than 50 reported, we analysed all available data for those days.

The figure shows an example of the detection frequency of mutations plotted at monthly intervals from the start of the pandemic. From 2020 until around May 2021, the Alpha variant carrying the N501Y (A23063T) mutation was dominant, after which it was replaced by the Delta variant carrying the L452R (T22916G) mutation. We also obtained extensive information on insertion and deletion mutations, which are a particular strength of PED. Although PED was originally developed primarily for detecting polymorphisms in eukaryotic organisms, it produced results for viral variants that were largely consistent with previously reported findings. When NGS data were available, results could be obtained within just a few minutes.

In this way, we were able to observe the evolution of SARS‑CoV‑2 during the pandemic almost in real time. However, eventually we suspended this monitoring analysis because SRA later limited downloads of the metadata for sample collection dates to approximately 100,000 records, preventing access to more recent data.

The ability to analyse mutations directly from publicly available raw sequencing data—without waiting for expert consensus analyses—was a highly significant development made evident during this pandemic.

https://akiomiyao.github.io/ped/covid19/index.html

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

COVID19
Life Sciences > Biological Sciences > Microbiology > Medical Microbiology > Infectious Diseases > COVID19
Genetics Research
Life Sciences > Health Sciences > Biomedical Research > Genetics Research
Medical Genetics
Life Sciences > Biological Sciences > Genetics and Genomics > Medical Genetics
  • BMC Bioinformatics BMC Bioinformatics

    This is an open access, peer-reviewed journal that considers articles describing novel computational algorithms and software, models and tools, including statistical methods, machine learning and artificial intelligence, as well as systems biology.

Related Collections

With Collections, you can get published faster and increase your visibility.

Predictive toxicology

BMC Bioinformatics is welcoming submissions to our Collection on Predictive toxicology.

Predictive toxicology investigates the harmful effects of chemical substances using models and data-driven methods, often aiming to decrease dependence on traditional animal testing, such as mammals, for assessing health risks. Developments in this field support New Approach Methodologies (NAMs) for evaluating chemical safety and regulation. NAMs refer to any methods that enhance safety assessments while avoiding animal testing. Specifically, predictive toxicology employs computational techniques with a mechanistic understanding of toxicity to estimate risks to human health and the environment.

Recent advances have highlighted the use of various technologies that generate data valuable for in silico toxicity prediction, including omics, in vitro screening, high-throughput phenotyping, organoids, and alternative in vivo models. These innovations, combined with comparative biology and insights from other disciplines (e.g., genetics, evolution), refine hazard and risk assessment methods, facilitating a more precise evaluation of chemical safety and ultimately improving health outcomes.

This Collection welcomes submissions on the development of new computational and/or statistical approaches for predictive toxicology.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer-review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Aug 14, 2026

Cell tracking

BMC Bioinformatics is welcoming submissions to our Collection on Cell Tracking.

Cell tracking is a technique used to monitor and analyze the movement and behavior of cells over time, allowing the study of cellular behaviors, dynamics, and interactions within various biological contexts. Advanced bioinformatics tools play a vital role in analyzing cell tracking data. They help identify cell movement patterns and understand their biological implications. These tools are particularly relevant when processing large datasets and when investigating cell cycles.

This Collection welcomes submissions on the development of new computational and/or statistical approaches for cell tracking. We encourage contributions detailing methods for detecting and characterizing cell movements to better understand cell migration and behavior.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer-review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Jun 23, 2026