Unveiling Early Signs of Parkinson’s Disease via A Longitudinal Analysis of Celebrity Speech Recordings

The Inspiration Behind ParkCeleb
We are excited to share the story behind our study, a project that has been both challenging and rewarding. The idea for this study came from a gap in the research: while many studies analyze speech to detect Parkinson’s disease (PD), few contain longitudinal data, especially from the crucial prodromal period where early symptoms, like subtle changes in speech, begin to emerge. This gap was a challenge we were determined to overcome.
A Novel Longitudinal Speech Corpus
The breakthrough came during a lunch break conversation with my supervisor, who suggested looking into publicly available speech recordings from celebrities who had disclosed their PD diagnoses. This led us to create ParkCeleb, a unique corpus of speech samples from 40 celebrities with PD and 40 matched controls spanning 30 years. We gathered speech from 10 years before diagnosis to 20 years after, allowing us to track the progression of speech patterns over time.
Overcoming Challenges in Data Collection
Collecting the data wasn’t easy. We had to deal with data scarcity, noisy environments, variations in recording quality, and the complexity of diarization (isolating the target speaker). But after months of meticulous work, we had a rich dataset ready for analysis. What stood out from the longitudinal analysis was how certain speech features, like pitch variability, pause duration, speech rate, and syllable duration, evolved over time in the subjects diagnosed with PD. These features changed significantly as the disease progressed, particularly in the years post-diagnosis.
Early Detection: A Breakthrough Moment
Even more exciting, we found early signs of dysarthria were detectable up to 10 years before diagnosis. With the help of machine learning classifiers, we achieved accuracies as high as 0.69 and 0.73 for data from 10 and 5 years before diagnosis and 0.87 post-diagnosis. It was thrilling to see that our data could reveal insights that weren’t previously possible.
What’s Next for ParkCeleb?
Looking ahead, we see enormous potential for this work to refine how we detect and monitor PD. The next steps could involve expanding the dataset to include more subjects of different ethnicities, professions, and other demographic characteristics. This could revolutionize how we approach early detection and monitoring in clinical trials.
Final Thoughts
This journey has been an incredible learning experience, and I’m excited to see where it takes us next!
Link to the ParkCeleb Repository
The ParkCeleb corpus is available in this Zenodo repository.
Follow the Topic
-
npj Parkinson's Disease
This journal publishes original basic science, translational and clinical research related to Parkinson's disease, including anatomy, etiology, genetics, cellular and molecular physiology, neurophysiology, epidemiology and therapeutic development and treatments.
Related Collections
With collections, you can get published faster and increase your visibility.
AI-assisted identification of novel multimodal imaging markers and underlying mechanisms in PD
Publishing Model: Open Access
Deadline: Jul 13, 2025
Parkinson’s Disease and the Microbiome
Publishing Model: Open Access
Deadline: Jul 28, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in