Capturing Diversity: A First Look at the Arab Pangenome
Published in Research Data and Biomedical Research
When we embarked on this study, we knew we were addressing a major gap in human genomics. Despite the availability of increasingly sophisticated genome references, Arab populations remained largely absent. We wanted to change that, and with support from regional collaborators, we began assembling what would become the first draft of the Arab Pangenome Reference (APR). Our process relied on cutting-edge sequencing technologies: high-fidelity long reads, ultralong nanopore reads, and Hi-C data. These approaches allowed us to build highly contiguous, haplotype-phased genome assemblies from 53 individuals representing diverse Arab ethnicities.
What we found exceeded our expectations. We discovered 111.96 million base pairs of novel sequences absent from even the most comprehensive human references like GRCh38 or T2T-CHM13. More importantly, gene duplications, such as TAF11L5 was consistently duplicated across all individuals we studied. We also uncovered millions of population-specific small variants and hundreds of thousands of structural variants, many with potential biomedical significance. This included a number of duplicated genes associated with recessive conditions, offering clues into genetic disease burdens within Arab populations. Even our exploration of mitochondrial genomes revealed previously uncharted sequence variation.
What started as a technical challenge turned into a broader mission; one rooted in equity, representation, and the future of personalized healthcare. By offering a high-quality, open resource based on Arab genomic diversity, the APR not only addresses historical gaps but also invites deeper collaboration, research, and policy change. We envision this resource being used to power more accurate variant interpretation, support population-specific GWAS studies, and improve rare disease diagnostics in the region. The journey has only begun, but we hope this work serves as both a scientific contribution and a call to action: to ensure all populations, especially those long underrepresented, are part of the genomic future.
Access the Data
The data supporting the Arab Pangenome Reference study are publicly available:
-
NCBI BioProject (raw sequencing reads):
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1108179 -
Sample metadata (SRA):
https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP509490 -
Genome assemblies and full reference dataset:
https://www.mbru.ac.ae/the-arab-pangenome-reference/
Follow the Topic
-
Nature Communications
An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.
Related Collections
With Collections, you can get published faster and increase your visibility.
Women's Health
Publishing Model: Hybrid
Deadline: Ongoing
Advances in neurodegenerative diseases
Publishing Model: Hybrid
Deadline: Dec 24, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in