Towards Sustainable Benchmarking in AI for Digital Pathology

Deep Learning (DL) can transform medical diagnostics, but its environmental cost is often overlooked. A new benchmarking approach for DL in pathology is vital, integrating carbon footprint with diagnostic performance to develop sustainable, efficient AI solutions.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

AI in Digital Pathology: Promise and Challenges

Artificial intelligence is reshaping medicine, and digital pathology is no exception. AI’s ability to process and analyze massive datasets, like whole-slide images (WSIs) of tissue samples, has revolutionized how diseases like cancer are detected. However, this progress comes with a hidden cost: the immense energy required to train and deploy these models generates significant carbon emissions. In our latest study, we propose a new way forward with the Environmentally Sustainable Performance (ESPer) score—a benchmark that evaluates AI models based not just on diagnostic accuracy but also on their environmental impact. An illustration of a proposed benchmarking workflow can be found in Figure 1.

The ESPer Score: Balancing Accuracy and Carbon Emissions

The ESPer score looks at two critical aspects: how well an AI model performs diagnostically and how much carbon it emits during training and inference. It also includes a weighting factor, which allows shifting the balance between performance and CO₂eq either way depending on the task. In our study, we tested five different AI models commonly used in pathology, including TransMIL2, CLAM3, InceptionV34, Vision Transformer5 (ViT), and Prov-GigaPath6 across tasks like classifying kidney transplant diseases and identifying subtypes of renal cell carcinoma (RCC). TransMIL and CLAM stood out with their ability to deliver high diagnostic accuracy while maintaining low carbon footprints. Prov-GigaPath, on the other hand, performed very well diagnostically but had a much higher environmental cost, which lowered its ESPer score.

Small Changes, Big Impact

We evaluated other practical ways to reduce AI’s environmental impact, as well, without sacrificing accuracy. For example, we found that using larger tiles with lower resolution for image analysis significantly reduced computational demands. Similarly, for RCC classification, analyzing just 10% of available image tiles achieved nearly the same performance as using the entire dataset. These small adjustments can make a big difference in reducing energy consumption. We show a more detailed comparison between workflows with and without ESPer and other reduction techniques in Figure 2.

Why This Matters on a Global Scale

Although the carbon footprint of AI in medicine might seem negligible on a case-by-case basis (like diagnosing a single patient), the cumulative impact across hospitals and patients worldwide could be enormous. That’s why we believe the ESPer framework could drive a meaningful shift in how AI models are developed. By prioritizing both performance and sustainability, researchers, policymakers, and industry leaders can align medical AI with global climate goals.

Challenges and the Path Forward

Of course, there are still challenges to address. For example, our current calculations don’t account for the carbon cost of hardware production or transportation. Additionally, the carbon intensity of electricity varies from one region to another, meaning the environmental impact of deploying the same AI model might differ depending on location. We also recognize that in certain clinical scenarios, diagnostic performance may need to take precedence over sustainability. Future iterations could include further research into accurately setting the weighting factor according to different scenarios and specific needs.  

Aligning Innovation with Responsibility

Ultimately, our work with the ESPer score is about promoting a balance between innovation and responsibility. AI is an incredibly powerful tool for improving healthcare outcomes, but we must ensure its benefits don’t come at the expense of the environment. By embedding sustainability into the development and evaluation of AI models, we can pave the way for a future where cutting-edge medical technologies coexist with a healthier planet.

If you’re curious to learn more, check out our full study: Ecologically sustainable benchmarking of AI models for histopathology.

Figure 1. Overview of Model Development with Environmentally Sustainable Performance (ESPer).

We used renal cell carcinoma subtyping (RCC) and kidney transplant disease classification (KTX) as use cases in our study. Based on the medical task, the weighting factor can be set upfront to prioritize between performance and ecological sustainability. The dataset row indicates which amount of data needs to be used for each step of model development. There are various approaches for model optimization, such as pruning, knowledge distillation, hyperparameter tuning, or quantization. These were described before and not tested here but were included in the figure to provide a more complete picture of model development.

 

Figure 2. Different scenarios of deep learning (DL) model development with and without using ESPer for RCC subtype classification.

The different approaches in (A), (B) and (C), highlight how much the carbon footprint can be reduced by employing ESPer in combination with other data reduction strategies.

 

References

  1. Vafaei Sadr, A. et al. Operational greenhouse-gas emissions of deep learning in digital pathology: a modelling study. The Lancet Digital Health S2589750023002194 (2023) doi:1016/S2589-7500(23)00219-4.
  2. Shao, Z. et al. TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification. CoRR abs/2106.00908, (2021).
  3. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
  4. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308.
  5. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929, (2020).
  6. Xu, H., Usuyama, N., Bagga, J. et al. A whole-slide foundation model for digital pathology from real-world data. Nature (2024). https://doi.org/10.1038/s41586-024-07441-whttps://doi.org/10.1038/s41586-024-07441-w

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Pathology
Life Sciences > Health Sciences > Clinical Medicine > Pathology
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Sustainability
Research Communities > Community > Sustainability
Renal Cancer
Life Sciences > Biological Sciences > Cancer Biology > Cancers > Urological Cancer > Renal Cancer
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Digital Health Equity and Access

This Collection explores innovations and challenges in advancing digital health equity and access, focusing on diverse populations and inclusive technologies.

Publishing Model: Open Access

Deadline: Sep 03, 2025

Effective Trialing of Digital Interventions

This collection focuses on Systematic assessment of digital medical interventions to identify challenges in targeted outcomes for designing robust studies for clinical researchers.

Publishing Model: Open Access

Deadline: Aug 15, 2025