My previous blog post introduced bidirectional alignment algorithm PED (Polymorphic Edge Detection), a method that aligns a reference genome with Next-Generation Sequencing (NGS) reads from both directions and detects the edges of genome portions where the mutation has occurred.
A key advantage of this approach is its ability to identify large deletion mutations often missed by other programs. Furthermore, it can detect various other mutations, including single-base substitutions, insertions, translocations, and inversions.
This blog post demonstrates how PED can be used for sequence analysis of organisms that have undergone genome editing with CRISPR/Cas9.
To illustrate, I began by searching for "CRISPR" on NCBI's Sequence Read Archive (SRA) to find relevant sequence data. I found a dataset from Umeå University titled "Genotyping of C. elegans mutants - CRISPR/Cas9 of all GPCR and neuropeptide genes" and downloaded it from NCBI.
The sequence data for a specific sample, ERR11472167, was downloaded using the fastq-dump command from the SRA Toolkit provided by NCBI:
fastq-dump --split-files ERR11472167
After saving the downloaded files to the ERR11472167/read directory, the PED program was run using the following command:
perl ped.pl target=ERR11472167,ref=WBcel235
Here, WBcel235 refers to the reference genome sequence for the nematode C. elegans. Subsequently, the snpEff program was used to identify the affected genes and the types of mutations, generating a list of these findings (Figure 1).
According to NCBI's BioSample database, the ERR11472167 sample was reported to have intended mutations in genes WBGene00005318 and WBGene00005319. As highlighted in red in Figure 1, the PED program confirmed mutations in the targeted genes (smg-10/WBGene00005318 and dsh-2/WBGene00000102), demonstrating its ability to verify successful genome editing.
Importantly, PED analysis also revealed numerous off-target mutations in unintended genomic locations. A total of sixty-two off-target mutations were identified in this specific C. elegans line.
Similarly, for sample ERR11472179, the SRA database indicated that gene WBGene00005641 (the sro-1 gene) was the target for genome editing.
As shown in Figure 2 (with sro-1/WBGene00005641 highlighted in red), PED confirmed a frameshift mutation in the sro-1 gene. However, it also detected 60 additional off-target mutations in this sample.
These examples demonstrate that the PED program is a valuable tool not only for verifying intended edits but also for comprehensively checking for off-target mutations in genome-edited organisms. We encourage researchers to try PED for their analyses.
References
Miyao, A., Kiyomiya, J.S., Iida, K. et al. Polymorphic edge detection (PED): two efficient methods of polymorphism detection from next-generation sequencing data. BMC Bioinformatics 20, 362 (2019). https://doi.org/10.1186/s12859-019-2955-6
https://github.com/akiomiyao/ped
Cingolani, P., Platts, A., Wang, leL., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., & Ruden, D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80–92. https://doi.org/10.4161/fly.19695