Toward trustworthy medical AI via leveraging foundation models


Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

In recent years, the field of medical artificial intelligence (AI) has undergone rapid advancements, demonstrating capabilities that once seemed unimaginable1-4. Particularly, AI models in dermatology can now analyze photos of skin lesions—easily taken with smartphone cameras—and determine whether the skin lesions are melanoma or not with high accuracy5-8. With the appropriate validation through clinical trials, these models have the potential to help triage patients, alleviate physician’s workloads, and expand access to care.

Yet, the path toward widespread adoption of medical AI in clinical settings faces a significant obstacle: the opaque “black-box” nature of these AI models. These models provide diagnoses without explaining the rationale behind their decisions. For medical AI to be safely deployed in clinical settings, it is crucial that it goes beyond accurate predictions. It must unveil the ‘why’ and ‘how’ behind the models’ decisions, ideally offering explanations in terms that are comprehensible to medical professionals9. Unfortunately, current explainable AI techniques, such as saliency maps, primarily focus on identifying important features for the model’s prediction, such as input pixels or particular regions in the image. We need a fundamentally different approach to convert image pixels into semantically meaningful, clinically relevant “concepts,” such as “darker pigmentation” and “asymmetric,” for a melanoma detecting AI model. However, achieving this level of transparency requires medical datasets with rich annotations of these medical concepts, which are very hard to obtain10.

MONET overview
Overview of MONET framework. (a) We develop the MONET model using a vast amount of medical data. (b) Automatic concept annotation by MONET enables (c) data auditing, (d) model auditing, and (e) developing inherently interpretable models. 

To address this challenge, we have turned to the latest advancements in AI. “Foundation models” (i.e., AI models trained on a vast dataset so as to be equipped with versatile abilities) have shown remarkable capabilities in recognizing and annotating human-understandable concepts automatically. To develop a foundation model for dermatology, we leveraged the collective knowledge of the medical community, as encapsulated in publicly available medical literature and medical textbooks. The foundation model we developed, MONET (Medical cONcept rETriever) is capable of richly annotating medical images with semantically meaningful medical concepts. We showed that by leveraging MONET’s ability to automatically annotate concepts, the transparency of the medical AI development pipeline can be significantly improved at all stages—be it auditing large-scale training data, scrutinizing models, or monitoring them post-deployment as the errors and biases in data and model can be explained in human-understandable terms. For instance, we used MONET to audit the ISIC dataset, the largest collection of dermatology images, which includes over 70k dermoscopic images commonly used in training dermatology AI models. Our auditing revealed differences between data sources within the ISIC dataset in how concepts correlate with benign or malignant categories. This insight is crucial for understanding which factors affect the transferability of medical AI models across different sites. Usually, such data auditing at scale is not feasible due to the lack of concept labels. 

Our approach is unique in that we utilize foundation models as a means to enhance the trustworthiness of traditional medical AI models rather than using them directly for diagnostic tasks. While traditional medical AI devices based on supervised learning are relatively well-established and regulated under FDA guidelines, medical AI devices based on foundation models still have significant developmental progress to make. Given this, we leverage the new capabilities of foundation models to enhance the utility of the existing AI models. Our approach enables us to inspect these AI models through the lens of medically relevant concepts, thereby facilitating their trustworthy deployment in clinical settings. Also, it is important to note that the approach we proposed is universally applicable across medical tasks. The success we've achieved in dermatology serves as a blueprint for potential applications in radiology, ophthalmology, and beyond. Our research marks a pivotal step towards a future where AI and medical professionals work together in a symbiotic relationship built on trust, driving healthcare innovation forward.


  1. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
  2. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).
  3. Dorr, D. A., Adams, L. & Embí, P. Harnessing the Promise of Artificial Intelligence Responsibly. JAMA 329, 1347–1348 (2023).
  4. Rajpurkar, P. & Lungren, M. P. The Current and Future State of AI Interpretation of Medical Images. N. Engl. J. Med. 388, 1981–1990 (2023).
  5. Jones, O. T. et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit. Health 4, e466–e476 (2022).
  6. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
  7. Vodrahalli, K. et al. TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos. in Biocomputing 2021 220–231 (WORLD SCIENTIFIC, 2020). doi:10.1142/9789811232701_0021.
  8. Omiye, J. A., Gui, H., Daneshjou, R., Cai, Z. R. & Muralidharan, V. Principles, applications, and future of artificial intelligence in dermatology. Front. Med. 10, (2023).
  9. DeGrave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S.-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. (2023) doi:10.1038/s41551-023-01160-9.
  10. Daneshjou, R., Yuksekgonul, M., Cai, Z. R., Novoa, R. A. & Zou, J. SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. in (2022).

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Care
Life Sciences > Health Sciences > Health Care
Clinical Medicine
Life Sciences > Health Sciences > Clinical Medicine
Medical Imaging
Life Sciences > Health Sciences > Health Care > Medical Physics > Medical Imaging
Biomedical Research
Life Sciences > Health Sciences > Biomedical Research
  • Nature Medicine Nature Medicine

    This journal encompasses original research ranging from new concepts in human biology and disease pathogenesis to new therapeutic modalities and drug development, to all phases of clinical work, as well as innovative technologies aimed at improving human health.