Towards Sustainable Benchmarking in AI for Digital Pathology
AI in Digital Pathology: Promise and Challenges
Artificial intelligence is reshaping medicine, and digital pathology is no exception. AI’s ability to process and analyze massive datasets, like whole-slide images (WSIs) of tissue samples, has revolutionized how diseases like cancer are detected. However, this progress comes with a hidden cost: the immense energy required to train and deploy these models generates significant carbon emissions. In our latest study, we propose a new way forward with the Environmentally Sustainable Performance (ESPer) score—a benchmark that evaluates AI models based not just on diagnostic accuracy but also on their environmental impact. An illustration of a proposed benchmarking workflow can be found in Figure 1.
The ESPer Score: Balancing Accuracy and Carbon Emissions
The ESPer score looks at two critical aspects: how well an AI model performs diagnostically and how much carbon it emits during training and inference. It also includes a weighting factor, which allows shifting the balance between performance and CO₂eq either way depending on the task. In our study, we tested five different AI models commonly used in pathology, including TransMIL2, CLAM3, InceptionV34, Vision Transformer5 (ViT), and Prov-GigaPath6 across tasks like classifying kidney transplant diseases and identifying subtypes of renal cell carcinoma (RCC). TransMIL and CLAM stood out with their ability to deliver high diagnostic accuracy while maintaining low carbon footprints. Prov-GigaPath, on the other hand, performed very well diagnostically but had a much higher environmental cost, which lowered its ESPer score.
Small Changes, Big Impact
We evaluated other practical ways to reduce AI’s environmental impact, as well, without sacrificing accuracy. For example, we found that using larger tiles with lower resolution for image analysis significantly reduced computational demands. Similarly, for RCC classification, analyzing just 10% of available image tiles achieved nearly the same performance as using the entire dataset. These small adjustments can make a big difference in reducing energy consumption. We show a more detailed comparison between workflows with and without ESPer and other reduction techniques in Figure 2.
Why This Matters on a Global Scale
Although the carbon footprint of AI in medicine might seem negligible on a case-by-case basis (like diagnosing a single patient), the cumulative impact across hospitals and patients worldwide could be enormous. That’s why we believe the ESPer framework could drive a meaningful shift in how AI models are developed. By prioritizing both performance and sustainability, researchers, policymakers, and industry leaders can align medical AI with global climate goals.
Challenges and the Path Forward
Of course, there are still challenges to address. For example, our current calculations don’t account for the carbon cost of hardware production or transportation. Additionally, the carbon intensity of electricity varies from one region to another, meaning the environmental impact of deploying the same AI model might differ depending on location. We also recognize that in certain clinical scenarios, diagnostic performance may need to take precedence over sustainability. Future iterations could include further research into accurately setting the weighting factor according to different scenarios and specific needs.
Aligning Innovation with Responsibility
Ultimately, our work with the ESPer score is about promoting a balance between innovation and responsibility. AI is an incredibly powerful tool for improving healthcare outcomes, but we must ensure its benefits don’t come at the expense of the environment. By embedding sustainability into the development and evaluation of AI models, we can pave the way for a future where cutting-edge medical technologies coexist with a healthier planet.
If you’re curious to learn more, check out our full study: Ecologically sustainable benchmarking of AI models for histopathology.
Figure 1. Overview of Model Development with Environmentally Sustainable Performance (ESPer).
We used renal cell carcinoma subtyping (RCC) and kidney transplant disease classification (KTX) as use cases in our study. Based on the medical task, the weighting factor can be set upfront to prioritize between performance and ecological sustainability. The dataset row indicates which amount of data needs to be used for each step of model development. There are various approaches for model optimization, such as pruning, knowledge distillation, hyperparameter tuning, or quantization. These were described before and not tested here but were included in the figure to provide a more complete picture of model development.
Figure 2. Different scenarios of deep learning (DL) model development with and without using ESPer for RCC subtype classification.
The different approaches in (A), (B) and (C), highlight how much the carbon footprint can be reduced by employing ESPer in combination with other data reduction strategies.
References
- Vafaei Sadr, A. et al. Operational greenhouse-gas emissions of deep learning in digital pathology: a modelling study. The Lancet Digital Health S2589750023002194 (2023) doi:1016/S2589-7500(23)00219-4.
- Shao, Z. et al. TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification. CoRR abs/2106.00908, (2021).
- Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
- Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308.
- Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929, (2020).
- Xu, H., Usuyama, N., Bagga, J. et al. A whole-slide foundation model for digital pathology from real-world data. Nature (2024). https://doi.org/10.1038/s41586-024-07441-whttps://doi.org/10.1038/s41586-024-07441-w
Follow the Topic
-
npj Digital Medicine
An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.
Related Collections
With collections, you can get published faster and increase your visibility.
Digital Health Equity and Access
Publishing Model: Open Access
Deadline: Sep 03, 2025
Effective Trialing of Digital Interventions
Publishing Model: Open Access
Deadline: Aug 15, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in