Why we did this study:
Acute respiratory failure is a common condition among hospitalized patients and a leading reason for ICU admission. As respiratory failure progresses, there is a continuum of respiratory support modalities administered by clinicians. Randomized controlled trials have shown that in specific populations, the type of respiratory support or oxygen modality, high flow nasal cannula (HFNC) vs non-invasive ventilation (NIV), may reduce progression to intubation and invasive mechanical ventilation (IMV) 1. However, prior trials used narrow inclusion criteria, limiting generalizability and failing to capture individualized treatment effects. In real-world practice, patients are often complex, and the decision for high flow nasal cannula versus NIV remains is often nuanced.
Our group previously developed an algorithm (RepFlow-CFR) that estimates patient specific outcomes under HFNC and NIV, adjusting for confounders, to predict which respiratory support modality is likely to reduce an individual patient’s risk of progressing to IMV2-4. Although such tools could support clinical decision making, barriers to implementation still exist. One important barrier is clinician acceptance and trust. Model outputs often appear to function like a “black box”, which makes interpretation difficult. Because clinical practice guidelines continue to shape physician decision-making, any discordance between algorithmic recommendations and guideline-based standards of care prevents adoption.
What we did:
To address this challenge, we integrated a large language model (LLM) into our decision pipeline as a guideline-aware, explainable reasoning layer. Using clinician notes and clinical data, we designed the LLM to interpret and refine the RepFlow-CFR predictions through the lens of established, physician-accepted practice guidelines.
We provided the LLM with summarized indications and contraindications for HFNC and NIV based off of the ERS and ATS guidelines5,6, the RepFLow-CFR model outputs, and relevant patient data and clinical notes. The LLM then generated a guideline-aligned, explainable recommendation, favoring HFNC, NIV or neither.
We then evaluated concordance between 1) The RepFlow-CFR recommendation and the patient’s actual treatment, 2) The LLM recommendation and the patient’s actual treatment. We then assessed outcomes when recommendations were concordant versus discordant. As a secondary goal we also had a subset of the LLM recommendations reviewed by 3 critical care expert physicians. Physicians assessed each LLM recommendation for accuracy, potential harm, clinical judgement, reasoning and comprehension.
What we found:
Patients whose actual treatment aligned with the deep counterfactual model and the LLM recommendations had better outcomes. In contrast, when the treatment deviated from the model- recommended modality, the likelihood of progression to IMV was significantly higher.
Chart review showed that the LLM’s treatment recommendations were largely consistent with clinical guidelines. However, several cases revealed incorrect reasoning or data retrieval errors in multiple instances. Agreement among the reviewing physicians with the LLM’s final recommendation ranged from 65-85%.
Why this is important:
This study demonstrates an approach that pairs a deep counterfactual inference model with a guideline-aware LLM to generate individualized patient recommendations for HFNC vs NIV in patients with acute respiratory failure. This framework addresses a key barrier to clinical AI implementation: establishing physician trust by ensuring adherence to accepted standards of care through guideline integration. Incorporating a guideline-constrained, explainable layer enhances transparency, safety, and alignment with clinical practice.
Ultimately, this approach advances the responsible integration of AI into healthcare by showing how deep learning models can be paired with guideline-based reasoning systems to support clinician decision-making in a safe and interpretable way.
Literature Cited
1 Frat, J. P. et al. High-flow oxygen through nasal cannula in acute hypoxemic respiratory failure. N Engl J Med 372, 2185-2196 (2015). https://doi.org/10.1056/NEJMoa1503326
2 Lam, J. Y. et al. Development, deployment, and continuous monitoring of a machine learning model to predict respiratory failure in critically ill patients. JAMIA Open 7, ooae141 (2024). https://doi.org/10.1093/jamiaopen/ooae141
3 Shashikumar, S. P. et al. Development and Prospective Validation of a Deep Learning Algorithm for Predicting Need for Mechanical Ventilation. Chest (2020). https://doi.org/10.1016/j.chest.2020.12.009
4 Lu, X. et al. Comparing High-Flow Nasal Cannula and Non-Invasive Ventilation in Critical Care: Insights from Deep Counterfactual Inference. Res Sq (2025). https://doi.org/10.21203/rs.3.rs-7230866/v1
5 Rochwerg, B. et al. Official ERS/ATS clinical practice guidelines: noninvasive ventilation for acute respiratory failure. Eur Respir J 50 (2017). https://doi.org/10.1183/13993003.02426-2016
6 Oczkowski, S. et al. ERS clinical practice guidelines: high-flow nasal cannula in acute respiratory failure. Eur Respir J 59 (2022). https://doi.org/10.1183/13993003.01574-2021