COMPOSER-LLM: Giving AI the Power to Understand the Story Behind Sepsis and Save Lives
Published in Healthcare & Nursing, Computational Sciences, and General & Internal Medicine
Why This Research Is a Step Forward
Sepsis is a formidable foe in modern medicine. It’s not a specific disease, but a life-threatening organ dysfunction caused by the body’s haywire response to an infection. Globally, it affects millions and is a leading cause of hospital deaths and staggering healthcare costs. We’ve known for a while that catching sepsis early and starting treatment fast can save lives and improve patient outcomes [1]. Many hospitals already use predictive AI models based on 'structured' data like lab results and vital signs. However, these systems often miss crucial clues hidden in 'unstructured' clinical notes – the detailed narratives written by doctors and nurses. We saw an opportunity here: what if we could teach an AI to understand the nuances in these notes to make sepsis prediction even more accurate? This became the driving question behind our work.
What We Did: Teaming Up AI Brains Amidst Real-World Hurdles
Our journey led us to create COMPOSER-LLM. The core idea was to combine our existing sepsis prediction model, COMPOSER [2], which itself has shown success in reducing sepsis mortality at UC San Diego Health [1], with the power of a Large Language Model (LLM) that excels at understanding human language. Think of it as giving our sepsis detective a new partner who can read between the lines.
The research was primarily computational. However, bringing such a system to life isn't as simple as plugging in an open-source AI framework. While these tools are more accessible than ever, there's often an 'illusion of simplicity'. Building and scaling clinical AI, especially for critical, time-sensitive conditions like sepsis, demands rigorous validation, a robust and secure infrastructure, and seamless integration into existing, often complex, clinical workflows [3-5]. Healthcare data itself is a challenge – it can be fragmented, inconsistent, and incomplete. Our LLM had to be adept at extracting meaningful information from this imperfect data.
A key part of COMPOSER-LLM is its ability to kick in when the initial prediction is a bit fuzzy – what we call 'high-uncertainty' cases. In these situations, the LLM would analyze clinical notes (such as nursing assessments, physician progress notes, radiology reports, etc.), looking for specific signs and symptoms related not only to sepsis but also to conditions that can mimic sepsis (such as cardiogenic shock, cirrhosis, and GI hemorrhage, among others). This enhanced ability to perform differential diagnosis (DDx) significantly improved the model's accuracy to confirm or rule out sepsis.
We put COMPOSER-LLM through its paces with both retrospective data and a prospective "silent mode" deployment, watching how it performed with real-time, sometimes incomplete, clinical notes – a crucial step, as prospective validation is key to ensuring models work in the real world, not just on historical data. This interdisciplinary effort brought together experts in biomedical informatics, emergency medicine, critical care, and more, all focused on refining this tool.
What We Found: A Smarter Sentry Against Sepsis
The results were genuinely exciting! COMPOSER-LLM significantly outperformed the standalone COMPOSER model. On a large dataset of patient encounters, it showed a sensitivity of 72.1% and a positive predictive value (PPV) of 52.9%, with an overall F-1 score of 61.0%. Crucially, it also reduced the number of false alarms.
When we deployed COMPOSER-LLM across emergency departments (EDs) in a real-world setting, the AI model demonstrated strong predictive performance, with a sensitivity of ~70% and PPV of ~58% (as opposed to 10-15% PPV for commercially available models!). Furthermore, when clinicians reviewed cases where COMPOSER-LLM raised an alarm for a patient who didn’t ultimately meet the full Sepsis-3 criteria (a 'false positive' by strict definition), they found that about 62% of these patients did have a suspected or confirmed bacterial infection at the time of the alert; raising the 'effective' model PPV above 80% and reinforcing model's clinical utility. Finally, 83.1% of false positives contained the actual diagnosis in the predicted differential list.
Broader Horizons: The Future of LLMs and the AI Adoption Maze
Our study shows that LLMs hold considerable promise for making clinical prediction tools smarter, especially by tapping into the rich, contextual goldmine of unstructured clinical notes. For conditions like sepsis, where timely intervention is paramount, even a small improvement in prediction accuracy and speed can make a big difference.
In the EDs, AI systems like COMPOSER-LLM counter cognitive biases by integrating a DDx process into predictive modeling. This mitigates 'anchoring bias' by compelling the AI to evaluate a broader range of potential conditions, rather than fixating on an initial suspicion. It also curtails 'automation bias,' as the AI's internal DDx assessment ensures possibilities like sepsis aren't prematurely dismissed, even if an initial alert is uncertain. This evolution of AI from simple binary predictions to delivering more nuanced, DDx-informed alerts marks a breakthrough in healthcare analytics, fostering comprehensive, context-aware clinical decision support and enhancing patient safety.
The COMPOSER-LLM system represents a significant advancement in clinical AI. It demonstrates how Large Language Models can be powerfully augmented with analytical tools, such as differential diagnosis (DDx) calculators, and effectively grounded in real-world clinical data—for instance, by using Retrieval Augmented Generation (RAG) to deeply understand patient notes. This innovative approach, designed for seamless integration with Electronic Health Records (EHRs) via interoperability standards like FHIR, is key to fostering more insightful and context-aware clinical decision support, paving the way for more sophisticated AI assistance in healthcare. One of the key aspects of our design is that the computationally intensive LLM is only called upon in those tricky, high-uncertainty cases, making the system more efficient and cost-effective.
However, the journey from a promising model to a widely adopted clinical tool is fraught with challenges that go far beyond the code. As hospitals and healthcare systems look to leverage AI, they face the classic tension between ‘build in-house’ and ‘buy from a vendor’. Building in-house, as we did with the foundational COMPOSER model (supported by over $10M in NIH grant funding over 5 years), offers immense customization and control [6]. But it requires substantial, sustained investment in expertise, infrastructure, and ongoing maintenance – resources that can be scarce. On the other hand, the allure of "free" or bundled AI tools within larger EHR packages can be misleading, sometimes hiding significant implementation costs in terms of staff time and effort, and may lack the robust validation or post-implementation monitoring needed to ensure they are truly effective and safe. The Epic Sepsis Model experience, for example, highlighted how a seemingly low-cost solution could underperform and consume valuable resources [7].
Ultimately, we believe that by carefully developing and prospectively deploying and validating AI tools like COMPOSER-LLM into clinical practice, and by generating high-quality clinical evidence for life-saving, we can navigate these challenges. This research is a step towards a future where AI works seamlessly alongside healthcare professionals, providing them with deeper insights and more time to focus on what matters most: their patients.
References:
[1] Boussina A, Shashikumar SP, Malhotra A, Owens RL, El-Kareh R, Longhurst CA, Quintero K, Donahue A, Chan TC, Nemati S, Wardi G. Impact of a deep learning sepsis prediction model on quality of care and survival. NPJ digital medicine. 2024 Jan 23;7(1):14.
[2] Shashikumar SP, Wardi G, Malhotra A, Nemati S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know”. NPJ digital medicine. 2021 Sep 9;4(1):134.
[3] Boussina A, Shashikumar S, Amrollahi F, Pour H, Hogarth M, Nemati S. Development & deployment of a real-time healthcare predictive analytics platform. In2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 2023 Jul 24 (pp. 1-4). IEEE.
[4] Wardi G, Owens R, Josef C, Malhotra A, Longhurst C, Nemati S. Bringing the promise of artificial intelligence to critical care: what the experience with sepsis analytics can teach us. Critical care medicine. 2023 Aug 1;51(8):985-91.
[5] Kwong JC, Nickel GC, Wang SC, Kvedar JC. Integrating artificial intelligence into healthcare systems: more than just the algorithm. NPJ Digital Medicine. 2024 Mar 1;7(1):52.
[6] Wardi G, Longhurst CA. Unreasonable effectiveness of training AI models locally. BMJ Quality & Safety. 2025 May 9.
[7] Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, Pestrue J, Phillips M, Konye J, Penoza C, Ghous M. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA internal medicine. 2021 Aug 1;181(8):1065-70.
Follow the Topic
-
npj Digital Medicine
An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.
Related Collections
With collections, you can get published faster and increase your visibility.
Artificial Intelligence in Emergency and Critical Care Medicine
Publishing Model: Open Access
Deadline: Oct 10, 2025
Digital Health Equity and Access
Publishing Model: Open Access
Deadline: Sep 03, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in