Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma

The decision making of deep neural networks is typically intransparent and thus unsuitable for clinical application. Here, we develop a transparent deep neural network for melanoma detection that can explain its predictions and evaluate its influence on dermatologists in a three-phase reader study.
Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

1. AI-supported melanoma recognition

Did you ever look at a mole and wonder whether it looks funny or has changed? When screening for skin cancer, dermatologists world-wide are essentially faced with the same question: Is that mole a benign nevus or another benign skin tumor? Or is it potentially malignant? Maybe even a melanoma? The gravitas of these questions becomes clear when we look at the consequences following from their answers. An erroneously excised benign nevus is in most localizations on the body not more than a small inconvenience for the bearer. A correctly excised melanoma in-situ (simply put, a melanoma that has not yet progressed into the skin to relevant degree) is deemed successfully treated when excised with sufficiently large surgical margins. However, a melanoma that is found when it has already grown into the skin requires more invasive treatment than just excision in many cases - if it is not already too late for the patient.

Consequently, the medical community is highly interested in detecting melanomas early. There are associated questions about screening and prevention that need to be addressed as well. But when we look at diagnosis only, early melanomas are hard to tell apart from benign skin changes, especially for doctors just starting to deal with skin tumors. To help alleviate this issue, we decided to take steps towards the development of an explainable artificial intelligence (XAI) support system for the differential diagnosis of melanomas and nevi.

To make the explanations provided by our XAI as useful as possible for doctors we decided to use localized domain-specific features as this “point and tell”-approach is an excellent way to explain decisions and resolve ambiguities about machine reasoning. This idea tied nicely into the argument made by Cynthia Rudin against post-hoc XAI in high stakes scenarios. (Rudin, 2019) I highly recommend reading her article because it is excellently written and makes the point much more elaborate than I can in 102 words. Nonetheless. Her argument boils down to the fact that saliency maps require interpretation by the observer and that observers are prone to confirmation bias by assuming that (a) there actually is a relevant feature in the salient region and given that many saliency map algorithms have a fairly low resolution (b) that the AI was indeed relying on it and nothing else. While such a bias is acceptable in a debugging scenario where the developer essentially wants to make sure that the AI is not relying on spurious correlations for its predictions, such a bias is undesirable in a high stakes scenario.

2. Study and findings

During the course of this project, we compiled an explanatory ontology based on dermoscopic terminology, had a novel data set annotated with the ontological features by 14 international board-certified dermatologists and developed an XAI performing on par with state-of-the-art skin cancer classifiers by extending on the work of Jalaboi et al. (Jalaboi et al., 2023) and Lucieri et al. (Lucieri et al., 2022) Be aware that I am giving the broad strokes of our work here - please refer to the paper (Chanda et al., 2024) for all relevant details.

As there is a distinct lack of evaluation of the effects of XAI on its users in a large-scale manner, especially on users who happen to be domain experts, we decided to conduct a reader study to shed light onto this question as well. The study consisted of three phases in which the 116 international participants (82 board-certified and 33 resident dermatologists as well as one nurse consultant specialized in dermoscopic skin cancer screening) diagnosed a small number of dermoscopic images of melanomas and nevi without any assistance (phase 1), with “traditional” non-explainable AI assistance (phase 2) and with XAI assistance (phase 3). In addition to the diagnoses, we collected explanatory annotations by the participants in phase 1 for post-study comparison with the explanations provided by the XAI. Furthermore, we asked the participants to indicate their confidence in their own diagnoses (phases 1-3) and their trust in the support system (phases 2 and 3). Our expectation was to see the participants’ diagnostic accuracy, confidence and trust increase with XAI assistance over both no support and support by a non-explanatory AI.

2.1. Reader accuracy, confidence and trust

We replicated the improvement of diagnostic accuracy of the study participants with (non-explanatory) AI support over no support reported in the literature. Much to our surprise, though, the diagnostic accuracy did not improve statistically significantly with XAI support over (non-explanatory) AI support. Upon further analysis, we found that participants with different levels of experience in dermoscopy benefited - somewhat unsurprisingly - to varying degrees from the XAI support. Our results suggest that the anticipated increase in diagnostic performance may be present in the subgroup of the most experienced dermoscopy users. However, we did not have enough participants in this subgroup to make statistically sound claims. While we would have been happy to find a universal, if varying, increase of diagnostic accuracy, our findings indicate the necessity of a nuanced examination of the influence of XAI on prospective users based on their level of experience with an explanatory framework.

The evaluation of the participants’ confidence in their own diagnoses and their trust into the support system, on the other hand, were essentially in line with what we expected. Namely, we saw confidence increase with AI support over no support, and both confidence and trust increase with XAI support over non-explainable AI support.

2.2. Findings relating to overlap of human and machine explanations

A key question for us with regard to eventual clinical applicability of an XAI system was how the explanations related to expert users in terms whether or not machine explanations and human explanations agreed with each other and whether an agreement or disagreement influenced the experts.

We are happy to report that we found that the explanations of our XAI and the explanations of the participants are well-aligned, meaning that they decided based on the same tumor features. Furthermore, we found that the alignment of the explanations was positively correlated with the trust the participants placed in the XAI.

3. Take away

We are happy to have laid down another stone paving the way to better - better understood and more empirically human-centered - human-machine interaction. I hope that in the long run our work will serve to bring improved medical instruments into the clinic to benefit patients and doctors alike.

4. References

Chanda, T., Hauser, K., Hobelsberger, S., Bucher, T.-C., Garcia, C. N., Wies, C., Kittler, H., Tschandl, P., Navarrete-Dechent, C., Podlipnik, S., Chousakos, E., Crnaric, I., Majstorovic, J., Alhajwan, L., Foreman, T., Peternel, S., Sarap, S., Özdemir, İ., Barnhill, R. L., … Brinker, T. J. (2024). Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. Nature Communications, 15(1), 524. https://doi.org/10.1038/s41467-023-43095-4

Jalaboi, R., Faye, F., Orbes-Arteaga, M., Jørgensen, D., Winther, O., & Galimzianova, A. (2023). DermX: An end-to-end framework for explainable automated dermatological diagnosis. Medical Image Analysis, 83. https://doi.org/10.1016/j.media.2022.102647

Lucieri, A., Bajwa, M. N., Braun, S. A., Malik, M. I., Dengel, A., & Ahmed, S. (2022). ExAID: A multimodal explanation framework for computer-aided diagnosis of skin lesions. Computer Methods and Programs in Biomedicine, 215. https://doi.org/10.1016/j.cmpb.2022.106620

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead (No. 5; Vol. 1). https://doi.org/10.1038/s42256-019-0048-x

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Dermatological Diagnoses
Life Sciences > Health Sciences > Clinical Medicine > Dermatology > Dermatological Diagnoses
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Melanoma
Life Sciences > Health Sciences > Clinical Medicine > Diseases > Cancers > Skin Cancer > Melanoma
Skin Cancer
Life Sciences > Health Sciences > Clinical Medicine > Diseases > Cancers > Skin Cancer

Related Collections

With collections, you can get published faster and increase your visibility.

Applications of Artificial Intelligence in Cancer

In this cross-journal collection between Nature Communications, npj Digital Medicine, npj Precision Oncology, Communications Medicine, Communications Biology, and Scientific Reports, we invite submissions with a focus on artificial intelligence in cancer.

Publishing Model: Open Access

Deadline: Mar 31, 2025

Biology of rare genetic disorders

This cross-journal Collection between Nature Communications, Communications Biology, npj Genomic Medicine and Scientific Reports brings together research articles that provide new insights into the biology of rare genetic disorders, also known as Mendelian or monogenic disorders.

Publishing Model: Open Access

Deadline: Apr 30, 2025