Unveiling Early Signs of Parkinson’s Disease via A Longitudinal Analysis of Celebrity Speech Recordings

Parkinson’s disease (PD) impacts speech early, but most studies lack large, longitudinal datasets. The ParkCeleb corpus contains speech samples from 40 celebrities with PD and 40 controls over 20 years, from 10 years before to 10 years after diagnosis, enabling early detection and monitoring of PD.
Unveiling Early Signs of Parkinson’s Disease via A Longitudinal Analysis of Celebrity Speech Recordings
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

 The Inspiration Behind ParkCeleb  

We are excited to share the story behind our study, a project that has been both challenging and rewarding. The idea for this study came from a gap in the research: while many studies analyze speech to detect Parkinson’s disease (PD), few contain longitudinal data, especially from the crucial prodromal period where early symptoms, like subtle changes in speech, begin to emerge. This gap was a challenge we were determined to overcome.

A Novel Longitudinal Speech Corpus 

The breakthrough came during a lunch break conversation with my supervisor, who suggested looking into publicly available speech recordings from celebrities who had disclosed their PD diagnoses. This led us to create ParkCeleb,  a unique corpus of speech samples from 40 celebrities with PD and 40 matched controls spanning 30 years. We gathered speech from 10 years before diagnosis to 20 years after, allowing us to track the progression of speech patterns over time.

Overcoming Challenges in Data Collection  

Collecting the data wasn’t easy. We had to deal with data scarcity, noisy environments, variations in recording quality, and the complexity of diarization (isolating the target speaker). But after months of meticulous work, we had a rich dataset ready for analysis. What stood out from the longitudinal analysis was how certain speech features, like pitch variability, pause duration, speech rate, and syllable duration, evolved over time in the subjects diagnosed with PD. These features changed significantly as the disease progressed, particularly in the years post-diagnosis.

Early Detection: A Breakthrough Moment  

Even more exciting, we found early signs of dysarthria were detectable up to 10 years before diagnosis. With the help of machine learning classifiers, we achieved accuracies as high as 0.69 and 0.73 for data from 10 and 5 years before diagnosis and 0.87 post-diagnosis. It was thrilling to see that our data could reveal insights that weren’t previously possible.

What’s Next for ParkCeleb?  

Looking ahead, we see enormous potential for this work to refine how we detect and monitor PD. The next steps could involve expanding the dataset to include more subjects of different ethnicities,  professions, and other demographic characteristics. This could revolutionize how we approach early detection and monitoring in clinical trials.

 Final Thoughts  

This journey has been an incredible learning experience, and I’m excited to see where it takes us next! 

Link to the ParkCeleb Repository

The ParkCeleb corpus is available in this Zenodo repository.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering
Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering
Signal, Speech and Image Processing
Technology and Engineering > Electrical and Electronic Engineering > Signal, Speech and Image Processing
Parkinson's disease
Life Sciences > Health Sciences > Clinical Medicine > Neurology > Neurological Disorders > Neurodegenerative diseases > Parkinson's disease
Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning
Methodology of Data Collection and Processing
Mathematics and Computing > Statistics > Methodology of Data Collection and Processing

Related Collections

With collections, you can get published faster and increase your visibility.

AI-assisted identification of novel multimodal imaging markers and underlying mechanisms in PD

This collection invites research papers on the application of AI in multimodal neuroimaging and related topics in Parkinson's disease and other related disorders.

Publishing Model: Open Access

Deadline: Jul 13, 2025

Parkinson’s Disease and the Microbiome

This collection invites research surrounding all aspects of the indigenous microbiome as it relates to Parkinson’s disease etiology, progression, and treatment.

Publishing Model: Open Access

Deadline: Jul 28, 2025