Why Are Large Language Models Bad at Medical Math? The Answer May Lie in How Doctors Actually Work

At Stanford's perioperative clinic, we noticed something surprising: Al systems that could pass medical board exams struggled with basic clinical math. The solution? It came from observing how doctors actually work.
Why Are Large Language Models Bad at Medical Math? The Answer May Lie in How Doctors Actually Work
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The Observation: A Fundamental Problem in Medical AI

When doctors review a transplant patient or assess surgical risk, they will often use validated medical calculators rather than doing math in their heads. When we posed these types of questions to recent AI models, even advanced Al struggled with these basic calculations, getting them wrong one-third of the time. This raised an important question: Could we improve Al's performance by having it work more like doctors do?

Our Goal: We want to clarify that our focus isn't to replace a doctor's expertise, but to augment it. By combining clinical judgment with precise, rapid calculations, we think we can enhance the decision- making process without diminishing the role or value of the physician.

Building a Collaborative Team

Our team brought together diverse perspectives from Stanford Medicine and UCSF to test this hypothesis. We hypothesized that instead of just trying to make Al better at math, we should give it access to the same validated medical calculation tools clinicians use every day. But because most of these tools (that have names like MDCalc or QxMD) are made for human doctors, AI isn’t able to interact with them. To address this, we developed a set of tools that were specifically made to be used by AI, releasing it as a free web tool called OpenMedCalc.

Testing Our Hypothesis with Clinical Tools

We evaluated this idea across 48 different medical calculations, from liver transplant scores to stroke risk assessments. We hypothesized that providing access to validated calculation tools would significantly improve accuracy compared to asking Al to perform calculations on its own.

How doctors, AI, and our solution approach medical calculations.
How doctors, AI, and our solution approach medical calculations. The traditional approach (left) requires manual data entry into calculators like MDCalc, which is time-consuming and can lead to errors. Current Al systems (middle) try to calculate directly or use general tools, leading to errors. Our approach (right) combines Al's ability to understand clinical text with validated calculation tools - similar to how doctors work.

Results That Changed Our Understanding

Our approach proved transformative. When we gave Al access to these specialized medical calculators through our OpenMedCalc platform, accuracy jumped from 33% to over 95%. This leap didn't come from making Al "smarter" but from equipping it with the right resources. It underscores how supporting physicians—and now Al — with specialized tools can possibly sharpen clinical decision-making without supplanting the doctor's expertise.

Accuracy improvements with different tools.
Accuracy improvements with different tools. Bar graph showing how accuracy improved from the base models (left) to our OpenMedCalc approach (right). Both LLaMa and GPT models showed dramatic improvements in accuracy when given access to clinical calculation tools, with our final approach achieving over 95% accuracy.

Looking Forward: Transforming Medical Calculations

While these results are exciting, we think they're just the beginning of making Al more reliable for health care. We invite you to read our complete findings in our recent Nature Digital  Medicine publication, which details how this approach could help make medical Al more dependable and clinically useful.

We've made our tools freely available through OpenMedCalc because we believe in open collaboration and advancing medical Al that truly serves patients and clinicians. To learn more about this work or explore potential collaborations, please visit openmedcalc.org.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Clinical Medicine
Life Sciences > Health Sciences > Clinical Medicine
Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning
Anesthesiology
Life Sciences > Health Sciences > Clinical Medicine > Anesthesiology
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Digital Health Equity and Access

This Collection explores innovations and challenges in advancing digital health equity and access, focusing on diverse populations and inclusive technologies.

Publishing Model: Open Access

Deadline: Sep 03, 2025

Effective Trialing of Digital Interventions

This collection focuses on Systematic assessment of digital medical interventions to identify challenges in targeted outcomes for designing robust studies for clinical researchers.

Publishing Model: Open Access

Deadline: Aug 15, 2025