Can machine-learning algorithms accurately predict high-need, high-cost patients using the data from claims and health screening program?

Nov 14, 2020

Itsuki Osawa, Goto Tadahiro, Yamamoto & Yusuke Tsugawa

4 contributors

Liked by Itsuki Osawa

Rapidly growing healthcare spending has become one of the most significant challenges in many developed countries. High-need, high-cost (HNHC) patients—often defined as the top 5% spenders of annual healthcare costs—account for about half of total healthcare costs. If these HNHC patients were to be constantly high-cost across multiple years, policymakers and insurers could easily identify this population and design tailored interventions targeted at this population to lower healthcare spending. However, about half of these HNHC patients turn out to be not high-cost in the prior year, suggesting that many people abruptly become HNHC patients due to unexpected illnesses and injuries, making it difficult to develop targeted interventions. The accurate prediction of individuals who will become HNHC patients in the near future is a critically important initial step for addressing healthcare spending growth.

**Figure 1 Distribution of annual healthcare costs in the working-age population in Japan, 2016**
In 2016, the top 1%, 5%, 10% of patients accounted for 26.4%, 47.7% and 60.0% of total annual healthcare costs.

Machine-learning-based prediction models have been shown to outperform conventional prediction models as they can account for the complex interplay among a large number of predictors. However, evidence is limited whether the machine-learning-based prediction model using clinical data from the health screening program combined with claims data could achieve high prognostic performance for predicting HNHC patients in subsequent years.

Our work has demonstrated that the prediction model using both clinical and claims data slightly improved the accuracy of the prediction compared with the model using only claims data (which is more easily-accessible than clinical data for many policymakers and insurers). In our study, we used a random sample of nationwide data on the working-age population who underwent a health screening program in Japan in 2013-2016, and developed five machine-learning-based prediction models (i.e., logistic regression, Lasso regression, random forest, gradient-boosted decision tree, and deep neural network) for HNHC patients in the subsequent year. Predictors included demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history) from the health screening programs.

Our prediction models exhibited good prognostic performance (AUC [area under the curve] of 0.84), and outperformed conventional prediction models relying only on claims data. These findings suggest that incorporating clinical data, which provide complementary information about the participants’ health status, is helpful for improving the performance of the prediction models that used only claims data.

**Figure 2 Prediction ability of e prediction models for HNHC patient**
A) ROC curves and B) Decision curve analysis showed the benefit of machine-learning-based prediction models compared to the reference model (i.e., conventional logistic regression).

The prediction models we developed should be useful for policymakers and payers by accurately predicting future HNHC patients in real-time and intervening if necessary to curb rapidly growing healthcare spending. Combining clinical data from the health screening programs and machine-learning techniques could have great potential to address many challenges in healthcare—including the increasing healthcare spending—more effectively.

Multiple Contributors

Itsuki Osawa, Goto Tadahiro, Yusuke Tsugawa & Yamamoto

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Care

Life Sciences > Health Sciences > Health Care

npj Digital Medicine

npj Digital Medicine

An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Digital Health Equity and Access

This Collection explores innovations and challenges in advancing digital health equity and access, focusing on diverse populations and inclusive technologies.

Publishing Model: Open Access

Deadline: Mar 03, 2026

Explore this Collection

Evaluating the Real-World Clinical Performance of AI

This Collection invites research on exploring how AI performs in real-world clinical settings, focusing on utility, safety, equity, and impact on healthcare.

Publishing Model: Open Access

Deadline: Jun 03, 2026

Explore this Collection

Latest Content

Call for papers: The Role of Physical Activity Played in Aging and Non-communicable Diseases

Behind the Paper

Building More Resilient Teams: A Mathematical Approach Using Hypergraphs

Telemedicine-delivered myofunctional therapy induces measurable upper airway remodeling in OSA

A rapid nanobody based approach for developing antimicrobials against drug- resistant bacteria and yeasts

Behind the Paper

Crystallographic Engineering Enables Fast Low‑Temperature Ion Transport of TiNb2O7 for Cold‑Region Lithium‑Ion Batteries

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Can machine-learning algorithms accurately predict high-need, high-cost patients using the data from claims and health screening program?

Share this post

Share with...

...or copy the link