Introduction
Cervical cancer (CC) remains the fourth most common malignancy among women worldwide, with over 600,000 new cases and more than 340,000 deaths each year. Persistent infection with high-risk human papillomavirus (HPV) is the primary driver of CC development, yet infection alone is insufficient to cause malignancy. The majority of HPV infections resolve spontaneously, suggesting that additional biological factors modulate progression to high-grade lesions and invasive cancer.
Recent research has pointed to the vaginal microbiome as one such factor. A healthy vaginal ecosystem is typically dominated by Lactobacillus species, which maintain a low pH, produce antimicrobial compounds, and contribute to mucosal immune defense. In contrast, dysbiotic states—characterized by reduced Lactobacillus abundance and increased anaerobic diversity—have been associated with higher HPV persistence, chronic inflammation, and epithelial barrier disruption. These microenvironmental changes may facilitate viral integration, immune evasion, and carcinogenesis.
However, the literature on CC-associated microbiome shifts is fragmented. Studies vary in their sampling strategies, sequencing platforms, targeted 16S rRNA regions, and analytical approaches, leading to inconsistent and sometimes contradictory findings. Without harmonized data analysis, it is difficult to distinguish genuine biological patterns from methodological noise.
To overcome these challenges, we performed a compositionality-aware mega-analysis of all publicly available CC microbiome datasets meeting strict inclusion criteria. By reprocessing raw sequence data from multiple studies through a unified bioinformatic pipeline, we minimized technical bias and maximized comparability. Our goal was to identify reproducible microbial signatures and functional pathways linked to CC and its HPV-positive subsets—findings that could lay the groundwork for novel diagnostic and preventive strategies.
Key Findings
Our standardized analysis revealed a consistent shift in CC microbiota from a Lactobacillus-dominated community to a more diverse, anaerobe-rich profile. Alpha diversity was significantly higher in CC, and enriched taxa included Porphyromonas asaccharolytica, Campylobacter ureolyticus, Peptococcus niger, and Anaerococcus obesiensis. Concurrently, protective Lactobacillus species, particularly L. crispatus, were markedly depleted, especially in HPV-positive CC.
Functional predictions indicated enrichment in pathways related to fatty acid biosynthesis, oxidative phosphorylation, and altered amino acid metabolism—changes consistent with known cancer biology. Several of these pathways mirrored host transcriptomic profiles from independent CC datasets, pointing to possible microbial–host metabolic convergence.
Machine learning models trained on these microbial profiles achieved high predictive performance (up to 93% accuracy with XGBoost), underscoring the translational potential for microbiome-based diagnostics.
Future Directions
We faced limitations in geographic diversity, sample size, and clinical metadata. Our next steps involve expanding to more diverse populations, integrating shotgun metagenomics and metabolomics, and exploring causality through experimental models. Ultimately, we aim to extend this analytical framework to other female-related cancers, seeking shared microbial “fingerprints” and actionable biomarkers.
Final Remark
This first paper in our female cancer microbiome project establishes a reproducible foundation for studying microbial influences on HPV-driven cancers. By harmonizing data and revealing robust microbial and functional signatures, we open pathways for targeted diagnostics and prevention strategies—not only in cervical cancer, but across the spectrum of female malignancies. Read the full paper here: https://rdcu.be/eAuwO