How machine learning helped us uncover key environmental and clinical risk factors to child health

The environment in which a child grows up affects its development and well-being later in life. These effects do not happen in isolation – whether due to air pollution, exposure to cleaning products, or a family's social capital, they all simultaneously impact a child's health.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Read the paper

SpringerLink
SpringerLink SpringerLink

Machine learning-based health environmental-clinical risk scores in European children - Communications Medicine

Background Early life environmental stressors play an important role in the development of multiple chronic disorders. Previous studies that used environmental risk scores (ERS) to assess the cumulative impact of environmental exposures on health are limited by the diversity of exposures included, especially for early life determinants. We used machine learning methods to build early life exposome risk scores for three health outcomes using environmental, molecular, and clinical data. Methods In this study, we analyzed data from 1622 mother-child pairs from the HELIX European birth cohorts, using over 300 environmental, 100 child peripheral, and 18 mother-child clinical markers to compute environmental-clinical risk scores (ECRS) for child behavioral difficulties, metabolic syndrome, and lung function. ECRS were computed using LASSO, Random Forest and XGBoost. XGBoost ECRS were selected to extract local feature contributions using Shapley values and derive feature importance and interactions. Results ECRS captured 13%, 50% and 4% of the variance in mental, cardiometabolic, and respiratory health, respectively. We observed no significant differences in predictive performances between the above-mentioned methods.The most important predictive features were maternal stress, noise, and lifestyle exposures for mental health; proteome (mainly IL1B) and metabolome features for cardiometabolic health; child BMI and urine metabolites for respiratory health. Conclusions Besides their usefulness for epidemiological research, our risk scores show great potential to capture holistic individual level non-hereditary risk associations that can inform practitioners about actionable factors of high-risk children. As in the post-genetic era personalized prevention medicine will focus more and more on modifiable factors, we believe that such integrative approaches will be instrumental in shaping future healthcare paradigms.

By applying machine learning, we identified key environmental stressors affecting children, analyzed their interactions, and quantified the risk of children developing health conditions due to these environmental exposures and clinical factors.

Environmental risks to child health

Using data from over 1,600 European mothers and children from six countries, our study looked at how living in cities, being exposed to chemicals, metabolic profiles, and other prenatal and childhood experiences work together to influence children’s mental, cardiovascular, and respiratory health.

In this study, we found that maternal stress, exposure to noise from neighbors and other kids, and various lifestyle factors, such as a child's diet or level of physical activity, play the biggest role in children's mental health (figure 1). We also identified biological factors, including child BMI and the presence of specific proteins, that can predict cardiometabolic and respiratory diseases.

Figure 1: Local explanations (SHAP) obtained from the mental health risk score sorted by order of importance. Each dot represents the impact of a factor on a child's predicted risk.

Being the first to develop such early-life environmental and clinical risk scores encompassing a wide range of factors, this study has significant implications for preventative care and treatment.

Using machine learning to study environmental impacts

The approach used to identify these risk factors and assess their relationships with health was a key part of this research. The study used supervised machine learning, a field of artificial intelligence in which an algorithm is trained using large, labeled datasets to make predictions on new datasets. Given that environmental factors rarely act in isolation, our study focuses on the complete set of environmental exposures encountered throughout life, commonly referred to as the human exposome.

Studying the human exposome means being open to complex and unexpected interactions. While traditional statistical methods often make assumptions about how things are related based on pre-defined patterns or formulas, our machine learning methods do not. They are more flexible and therefore helped us uncover and more accurately understand the complex relationships between our environment and health.

Addressing the challenges of machine learning

New approaches also bring new challenges. One of the difficulties of using machine learning is that it can be hard to interpret the data, due to its complexity and the number of parameters involved. This is something that our approach already considers.

The second challenge is the amount of data needed to train the algorithms that are used in machine learning properly. Gathering data sets on thousands of people as well as hundreds of different environmental and clinical factors is costly and time-consuming. Although steps are being taken to address this through federated learning, a technique that allows multiple servers to collaboratively train a model without sharing their data, additional funding and effort are needed to advance this area of research.

To ensure that our model produces the most trustworthy and generalizable environmental risk scores possible, we are now looking to incorporate well-known environmental health effects from existing literature into the machine learning process.

Despite these challenges, our research marked a great stride forward in better predicting how the environment a child grows up in will impact their health in the future with the help of machine learning models. By further understanding these environmental risk factors, healthcare leaders and practitioners can put in place preventative care measures to better protect children across the EU.

Read the full study

Learn more about the ATHLETE research project

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Epidemiology
Life Sciences > Health Sciences > Biomedical Research > Epidemiology
Environmental Health
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Environmental Health

Related Collections

With collections, you can get published faster and increase your visibility.

Applications of Artificial Intelligence in Cancer

In this cross-journal collection between Nature Communications, npj Digital Medicine, npj Precision Oncology, Communications Medicine, Communications Biology, and Scientific Reports, we invite submissions with a focus on artificial intelligence in cancer.

Publishing Model: Open Access

Deadline: Dec 30, 2024

Health in Africa

We aim to promote high-quality research that advances our understanding of health issues in Africa, and advocates for better healthcare on the continent in line with the UN’s SDGs.

Publishing Model: Open Access

Deadline: Dec 31, 2024