Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer Using Machine Learning

Published in Cancer and Computational Sciences
Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer Using Machine Learning

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Why Study Breast Cancer Risk Stratification?

Breast cancer remains one of the most common cancers affecting women worldwide, making risk stratification crucial for early detection. Traditional models like the Tyrer-Cuzick (TC) model have been invaluable, leveraging clinical, demographic, and reproductive health data to predict risk. However, these models can be limited by the availability and complexity of required data. Our study introduces an innovative approach using machine learning to analyze routine complete blood count (CBC) tests, providing a more accessible, cost-effective method for risk evaluation that can be applied globally.

What is the Innovative Idea Behind This Work?

Our approach harnesses the power of machine learning to analyze much more readily available data—routine CBC tests—which are cheap and accessible worldwide. By integrating specific blood markers of CBC data with patient age, our model identifies complex patterns indicative of breast cancer risk. This method allows us to uncover correlations not detectable by traditional means, making the CBC a powerful tool for breast cancer risk stratification.

How Can Complete Blood Count Show Signs of Breast Cancer?

CBC tests, commonly used to evaluate overall health, can reveal subtle changes in blood parameters that might indicate underlying health conditions, including cancer. Our study focused on the markers neutrophil-to-lymphocyte ratio (NLR), age, and red blood cell (RBC) count. These markers are well-known for their associations with cancer. For instance, an elevated NLR is often linked to inflammation and immune response, which can be indicative of cancer progression. Similarly, lower RBC counts can reflect the body's chronic inflammatory state. Age is a well-known risk factor for breast cancer. Using AI, we can detect these complex patterns and improve risk prediction.

Main Findings from This Study

  1. Repurposing CBC for Risk Stratification: Our ridge regression model, incorporating NLR, RBC, and age, achieved an AUC of 0.64 (95% CI 0.64–0.65), which is slightly better than the TC model (95% CI 0.61-0.62), while using much more readily available data.

This graph illustrates the percentage of breast cancers detected in relation to the percentage of the screened population. It compares two screening methods: prioritized screening, giving by the CBC model (green line) and random screening (yellow line). The green line shows that prioritized screening detects a higher percentage of cancers across all levels of population screening compared to random screening.
Effectiveness of CBC prioritization vs. random screening for breast cancer
  1. Identification of Novel Risk Factors: By analyzing data from 396,848 women, we identified significant CBC-derived ratios and markers contributing to breast cancer risk, providing new insights into the disease.
  2. Personalized Prevention Strategies: The model stratifies the population into high, moderate, average, and low-risk groups, facilitating targeted screening and intervention, which is especially beneficial in resource-limited settings.

Future Directions and Limitations

Future research should focus on validating our model with diverse populations to ensure its generalizability. Additionally, while our study leverages routine blood tests, integrating other clinical data could enhance the model's accuracy. One limitation is the absence of certain demographic and clinical details, such as ethnicity and comorbidities, which could affect the results. Another is the lack of external validation across different settings. Addressing these limitations will be crucial for broader application and effectiveness.


Our AI-driven breast cancer risk stratification model, based on routine CBC tests, represents a significant advancement in early detection methods. By using readily available and affordable data, this approach is promising to enhance breast cancer screening, particularly in low-resource environments. We believe our research will inspire further innovation in medical diagnostics and contribute to more accessible healthcare globally.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Breast Cancer
Life Sciences > Biological Sciences > Cancer Biology > Cancers > Breast Cancer
Cancer Screening
Life Sciences > Biological Sciences > Cancer Biology > Cancer Screening
Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Related Collections

With collections, you can get published faster and increase your visibility.

Animal migration

Publishing Model: Open Access

Deadline: Aug 31, 2024