Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015-2018

Published in Biomedical Research

Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015-2018
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

BioMed Central
BioMed Central BioMed Central

Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018 - BMC Public Health

Background Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL. Methods Data were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method. Results RF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution. Conclusion In this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model. Clinical trial number not applicable.

Background

Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL.

Methods

 Data were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method.

Results

 RF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution.

Conclusion

 In this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Research
Life Sciences > Health Sciences > Biomedical Research

Related Collections

With Collections, you can get published faster and increase your visibility.

Monitoring, preventing, and managing type 2 diabetes

BMC Public Health is calling for submissions to our Collection on Monitoring, preventing, and managing diabetes at the population level. With rates of type 2 diabetes rising globally, especially in low- and middle-income countries and underserved communities, prevention strategies are critical. As the disease progresses people with diabetes are at increased risk of complications such as cardiovascular and kidney diseases, neuropathy and visual loss.

This Collection seeks submissions that explore population-level approaches to monitoring rates of diabetes, preventing or delaying the development of type 2 diabetes, and system-wide efforts to improve the management of the disease and reduce rates of complications, with a focus on improving health outcomes and reducing healthcare burdens.

Submissions are encouraged on primary prevention initiatives and culturally adapted, community-level interventions to reduce the risk of diabetes. Research aimed at improving systems for monitoring rates of diabetes and its complications through routinely-collected health data, or for improving management by enhancing patient engagement with healthcare systems or better identifying those in need, are encouraged. Research on diabetes education and support systems is also welcomed, with a focus on empowering individuals to adopt and sustain healthier lifestyles and avoid known causes of diabetes.

Additional topics of interest include (but are not limited to):

Access to healthcare and diabetes management

The impact of food insecurity on diabetes outcomes

Community-based interventions for low-income populations

Interventions to reduce exposure to environmental causes of diabetes

Financial barriers to diabetes medication and treatment

Housing instability, employment status and type 2 diabetes

Health literacy, poverty, and diabetes management

Policies to reduce poverty-related health disparities in diabetes

This Collection supports and amplifies research related to SDG 3: Good Health & Well-Being.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Jul 16, 2026

Male reproductive health

BMC Public Health invites submissions to our new Collection, "Male reproductive health”. Male reproductive health is an essential yet often overlooked aspect of public health that encompasses various factors affecting men's fertility and overall well-being. Issues such as declining sperm counts, poor sperm quality, the impact of environmental exposures and the effects of lifestyle and dietary factors on reproductive outcomes are gaining increasing attention.

This Collection seeks to examine the multifaceted influences on male reproductive health, considering environmental, infectious and sociocultural dimensions that affect male reproductive parameters and contribute to fertility challenges. Continued research in this area could help identify causative factors and contribute to advances in public health policies, ultimately benefiting future generations.

Key topics of interest include, but are not limited to:

Environmental pollution and reproductive health

Sperm count trends and implications for fertility

The role of occupational exposures in male fertility

The effects of infectious diseases on sperm parameters

Impact of diet and lifestyle factors on sperm quality and fertility

Psychological factors influencing reproductive health

Interventions for improving male reproductive health

This Collection supports and amplifies research related to Sustainable Development Goal 3- Good Health and Well-Being.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Jun 29, 2026