Behind the Paper

Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer Using Machine Learning

Published in Cancer and Computational Sciences

May 28, 2024

Daniella Castro Araujo

CTO, Huna

Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer Using Machine Learning

Liked by India Ambler

Explore the Research

Why Study Breast Cancer Risk Stratification?

Breast cancer remains one of the most common cancers affecting women worldwide, making risk stratification crucial for early detection. Traditional models like the Tyrer-Cuzick (TC) model have been invaluable, leveraging clinical, demographic, and reproductive health data to predict risk. However, these models can be limited by the availability and complexity of required data. Our study introduces an innovative approach using machine learning to analyze routine complete blood count (CBC) tests, providing a more accessible, cost-effective method for risk evaluation that can be applied globally.

What is the Innovative Idea Behind This Work?

Our approach harnesses the power of machine learning to analyze much more readily available data—routine CBC tests—which are cheap and accessible worldwide. By integrating specific blood markers of CBC data with patient age, our model identifies complex patterns indicative of breast cancer risk. This method allows us to uncover correlations not detectable by traditional means, making the CBC a powerful tool for breast cancer risk stratification.

How Can Complete Blood Count Show Signs of Breast Cancer?

CBC tests, commonly used to evaluate overall health, can reveal subtle changes in blood parameters that might indicate underlying health conditions, including cancer. Our study focused on the markers neutrophil-to-lymphocyte ratio (NLR), age, and red blood cell (RBC) count. These markers are well-known for their associations with cancer. For instance, an elevated NLR is often linked to inflammation and immune response, which can be indicative of cancer progression. Similarly, lower RBC counts can reflect the body's chronic inflammatory state. Age is a well-known risk factor for breast cancer. Using AI, we can detect these complex patterns and improve risk prediction.

Main Findings from This Study

Repurposing CBC for Risk Stratification: Our ridge regression model, incorporating NLR, RBC, and age, achieved an AUC of 0.64 (95% CI 0.64–0.65), which is slightly better than the TC model (95% CI 0.61-0.62), while using much more readily available data.

This graph illustrates the percentage of breast cancers detected in relation to the percentage of the screened population. It compares two screening methods: prioritized screening, giving by the CBC model (green line) and random screening (yellow line). The green line shows that prioritized screening detects a higher percentage of cancers across all levels of population screening compared to random screening. — Effectiveness of CBC prioritization vs. random screening for breast cancer

Identification of Novel Risk Factors: By analyzing data from 396,848 women, we identified significant CBC-derived ratios and markers contributing to breast cancer risk, providing new insights into the disease.
Personalized Prevention Strategies: The model stratifies the population into high, moderate, average, and low-risk groups, facilitating targeted screening and intervention, which is especially beneficial in resource-limited settings.

Future Directions and Limitations

Future research should focus on validating our model with diverse populations to ensure its generalizability. Additionally, while our study leverages routine blood tests, integrating other clinical data could enhance the model's accuracy. One limitation is the absence of certain demographic and clinical details, such as ethnicity and comorbidities, which could affect the results. Another is the lack of external validation across different settings. Addressing these limitations will be crucial for broader application and effectiveness.

Conclusion

Our AI-driven breast cancer risk stratification model, based on routine CBC tests, represents a significant advancement in early detection methods. By using readily available and affordable data, this approach is promising to enhance breast cancer screening, particularly in low-resource environments. We believe our research will inspire further innovation in medical diagnostics and contribute to more accessible healthcare globally.

Daniella Castro Araujo (She/Her)

CTO, Huna

Co-founder and CTO of Huna, a pioneering Brazilian deep tech company leveraging machine learning for accessible cancer screening. Researcher at the Artificial Intelligence Laboratory at UFMG (Universidade Federal de Minas Gerais, Brazil). Recipient of five awards, including the Bayer Foundation's Empowering Women Award. I hold a master’s degree in Operations Research from USP (Universidade de São Paulo, Brazil) and a Ph.D. in Artificial Intelligence from UFMG. My primary interest is applying machine learning to healthcare problems.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Breast Cancer

Life Sciences > Biological Sciences > Cancer Biology > Cancers > Breast Cancer

Cancer Screening

Life Sciences > Biological Sciences > Cancer Biology > Cancer Screening

Machine Learning

Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Scientific Reports

Scientific Reports

An open access journal publishing original research from across all areas of the natural sciences, psychology, medicine and engineering.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Reproductive Health

This Collection welcomes submissions related to a broad range of topics within reproductive health care and medicine related to reproductive well-being.

Publishing Model: Hybrid

Deadline: Mar 30, 2026

Explore this Collection

Women’s Health

In this cross-journal Collection we invite submissions of pre-clinical and clinical studies focusing on Women's Health.

Publishing Model: Open Access

Deadline: Feb 14, 2026

Explore this Collection

Reprogramming the morphology of colorectal cancer cells in 3D cultures to enhance therapeutic response

Behind the Paper

How a Friendly Gut Microbe May Protect African Children from Infections

Behind the Paper

Discovering and quantifying the eco-physiological advantages of plant-soil-Arbuscular Mycorrhizal Fungi (AMF) system: a promising eco-math-statistical modelling approach

Behind the Paper

Behind the paper: Greener hybrid polypropylene composites using flax, basalt and rice husk powder

Behind the Paper

Expected effects of a global transformation of agricultural pest management

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer Using Machine Learning

Share this post

Share with...

...or copy the link