COMPOSER-LLM: Giving AI the Power to Understand the Story Behind Sepsis and Save Lives

We developed COMPOSER-LLM, an AI system that dives into doctors' notes to spot sepsis earlier, showing how language models can boost life-saving predictions in busy hospitals.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Why This Research Is a Step Forward

Sepsis is a formidable foe in modern medicine. It’s not a specific disease, but a life-threatening organ dysfunction caused by the body’s haywire response to an infection. Globally, it affects millions and is a leading cause of hospital deaths and staggering healthcare costs. We’ve known for a while that catching sepsis early and starting treatment fast can save lives and improve patient outcomes [1]. Many hospitals already use predictive AI models based on 'structured' data like lab results and vital signs. However, these systems often miss crucial clues hidden in 'unstructured' clinical notes – the detailed narratives written by doctors and nurses. We saw an opportunity here: what if we could teach an AI to understand the nuances in these notes to make sepsis prediction even more accurate? This became the driving question behind our work.

What We Did: Teaming Up AI Brains Amidst Real-World Hurdles

Our journey led us to create COMPOSER-LLM. The core idea was to combine our existing sepsis prediction model, COMPOSER [2], which itself has shown success in reducing sepsis mortality at UC San Diego Health [1], with the power of a Large Language Model (LLM) that excels at understanding human language. Think of it as giving our sepsis detective a new partner who can read between the lines. 

The research was primarily computational. However, bringing such a system to life isn't as simple as plugging in an open-source AI framework. While these tools are more accessible than ever, there's often an 'illusion of simplicity'. Building and scaling clinical AI, especially for critical, time-sensitive conditions like sepsis, demands rigorous validation, a robust and secure infrastructure, and seamless integration into existing, often complex, clinical workflows [3-5]. Healthcare data itself is a challenge – it can be fragmented, inconsistent, and incomplete. Our LLM had to be adept at extracting meaningful information from this imperfect data.

A key part of COMPOSER-LLM is its ability to kick in when the initial prediction is a bit fuzzy – what we call 'high-uncertainty' cases. In these situations, the LLM would analyze clinical notes (such as nursing assessments, physician progress notes, radiology reports, etc.), looking for specific signs and symptoms related not only to sepsis but also to conditions that can mimic sepsis (such as cardiogenic shock, cirrhosis, and GI hemorrhage, among others). This enhanced ability to perform differential diagnosis (DDx) significantly improved the model's accuracy to confirm or rule out sepsis. 

We put COMPOSER-LLM through its paces with both retrospective data and a prospective "silent mode" deployment, watching how it performed with real-time, sometimes incomplete, clinical notes – a crucial step, as prospective validation is key to ensuring models work in the real world, not just on historical data. This interdisciplinary effort brought together experts in biomedical informatics, emergency medicine, critical care, and more, all focused on refining this tool.

What We Found: A Smarter Sentry Against Sepsis

The results were genuinely exciting! COMPOSER-LLM significantly outperformed the standalone COMPOSER model. On a large dataset of patient encounters, it showed a sensitivity of 72.1% and a positive predictive value (PPV) of 52.9%, with an overall F-1 score of 61.0%. Crucially, it also reduced the number of false alarms.

When we deployed COMPOSER-LLM across emergency departments (EDs) in a real-world setting, the AI model demonstrated strong predictive performance, with  a sensitivity of ~70% and PPV of ~58% (as opposed to 10-15% PPV for commercially available models!). Furthermore, when clinicians reviewed cases where COMPOSER-LLM raised an alarm for a patient who didn’t ultimately meet the full Sepsis-3 criteria (a 'false positive' by strict definition), they found that about 62% of these patients did have a suspected or confirmed bacterial infection at the time of the alert; raising the 'effective' model PPV above 80% and reinforcing model's clinical utility. Finally, 83.1% of false positives contained the actual diagnosis in the predicted differential list. 

Broader Horizons: The Future of LLMs and the AI Adoption Maze

Our study shows that LLMs hold considerable promise for making clinical prediction tools smarter, especially by tapping into the rich, contextual goldmine of unstructured clinical notes. For conditions like sepsis, where timely intervention is paramount, even a small improvement in prediction accuracy and speed can make a big difference.

In the EDs, AI systems like COMPOSER-LLM counter cognitive biases by integrating a DDx process into predictive modeling. This mitigates 'anchoring bias' by compelling the AI to evaluate a broader range of potential conditions, rather than fixating on an initial suspicion. It also curtails 'automation bias,' as the AI's internal DDx assessment ensures possibilities like sepsis aren't prematurely dismissed, even if an initial alert is uncertain. This evolution of AI from simple binary predictions to delivering more nuanced, DDx-informed alerts marks a breakthrough in healthcare analytics, fostering comprehensive, context-aware clinical decision support and enhancing patient safety.

The COMPOSER-LLM system represents a significant advancement in clinical AI. It demonstrates how Large Language Models can be powerfully augmented with analytical tools, such as differential diagnosis (DDx) calculators, and effectively grounded in real-world clinical data—for instance, by using Retrieval Augmented Generation (RAG) to deeply understand patient notes. This innovative approach, designed for seamless integration with Electronic Health Records (EHRs) via interoperability standards like FHIR, is key to fostering more insightful and context-aware clinical decision support, paving the way for more sophisticated AI assistance in healthcare.  One of the key aspects of our design is that the computationally intensive LLM is only called upon in those tricky, high-uncertainty cases, making the system more efficient and cost-effective.

However, the journey from a promising model to a widely adopted clinical tool is fraught with challenges that go far beyond the code. As hospitals and healthcare systems look to leverage AI, they face the classic tension between ‘build in-house’ and ‘buy from a vendor’. Building in-house, as we did with the foundational COMPOSER model (supported by over $10M in NIH grant funding over 5 years), offers immense customization and control [6]. But it requires substantial, sustained investment in expertise, infrastructure, and ongoing maintenance – resources that can be scarce. On the other hand, the allure of "free" or bundled AI tools within larger EHR packages can be misleading, sometimes hiding significant implementation costs in terms of staff time and effort, and may lack the robust validation or post-implementation monitoring needed to ensure they are truly effective and safe. The Epic Sepsis Model experience, for example, highlighted how a seemingly low-cost solution could underperform and consume valuable resources [7].

Ultimately, we believe that by carefully developing and prospectively deploying and validating AI tools like COMPOSER-LLM into clinical practice, and by generating high-quality clinical evidence for life-saving, we can navigate these challenges. This research is a step towards a future where AI works seamlessly alongside healthcare professionals, providing them with deeper insights and more time to focus on what matters most: their patients.

References:

[1] Boussina A, Shashikumar SP, Malhotra A, Owens RL, El-Kareh R, Longhurst CA, Quintero K, Donahue A, Chan TC, Nemati S, Wardi G. Impact of a deep learning sepsis prediction model on quality of care and survival. NPJ digital medicine. 2024 Jan 23;7(1):14.

[2] Shashikumar SP, Wardi G, Malhotra A, Nemati S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know”. NPJ digital medicine. 2021 Sep 9;4(1):134.

[3] Boussina A, Shashikumar S, Amrollahi F, Pour H, Hogarth M, Nemati S. Development & deployment of a real-time healthcare predictive analytics platform. In2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 2023 Jul 24 (pp. 1-4). IEEE.

[4] Wardi G, Owens R, Josef C, Malhotra A, Longhurst C, Nemati S. Bringing the promise of artificial intelligence to critical care: what the experience with sepsis analytics can teach us. Critical care medicine. 2023 Aug 1;51(8):985-91.

[5] Kwong JC, Nickel GC, Wang SC, Kvedar JC. Integrating artificial intelligence into healthcare systems: more than just the algorithm. NPJ Digital Medicine. 2024 Mar 1;7(1):52.

[6] Wardi G, Longhurst CA. Unreasonable effectiveness of training AI models locally. BMJ Quality & Safety. 2025 May 9.

[7] Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, Pestrue J, Phillips M, Konye J, Penoza C, Ghous M. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA internal medicine. 2021 Aug 1;181(8):1065-70.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Sepsis
Life Sciences > Biological Sciences > Immunology > Inflammation > Sepsis
Predictive Markers
Life Sciences > Health Sciences > Clinical Medicine > Diagnosis > Biomarkers > Predictive Markers
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning
Emergency Medicine
Life Sciences > Health Sciences > Clinical Medicine > Emergency Medicine
Natural Language Processing (NLP)
Mathematics and Computing > Computer Science > Artificial Intelligence > Natural Language Processing (NLP)
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Artificial Intelligence in Emergency and Critical Care Medicine

This Collection focuses on the unique challenges and opportunities for artificial intelligence (AI) applications in the emergency department (ED) and intensive care unit (ICU), environments where rapid decision-making and precision are critical to patient survival. These settings are characterized by their fast pace, high patient turnover, unpredictable workloads, and the need to manage acute and life-threatening conditions.

Publishing Model: Open Access

Deadline: Oct 10, 2025

Digital Health Equity and Access

This Collection explores innovations and challenges in advancing digital health equity and access, focusing on diverse populations and inclusive technologies.

Publishing Model: Open Access

Deadline: Sep 03, 2025