Behind the Paper

Enhancing Privacy-Preserving Learning for Biomedical Applications with Large-Scale Distributed Data

Federated learning (FL) is a technique that allows distributed data holders (e.g., hospitals) to collaboratively train an AI model without sharing the data. Departing from the conventional learning model, FL provides a new venue to secure local data while still benefiting from external data sources.

Published in Bioengineering & Biotechnology, Computational Sciences, and Biomedical Research

May 28, 2024

Le Peng and Rui Zhang

2 contributors

Enhancing Privacy-Preserving Learning for Biomedical Applications with Large-Scale Distributed Data

Liked by India Ambler and 1 other

Explore the Research

The Gap Needs to be Filled Between Theory and Practice

The existing research on FL is mostly conducted on image and generic data in a simulated environment but lacks the study of biomedical data with real-world implementation considerations such as model size, collaboration scale, distribution shift, etc. In real-world implementation, especially clinical deployment, these factors, although often ignored, are critical concerns. For example, hospitals may be equipped with varied computing resources and infrastructure which restricts the choice of model architecture and model size; collaboration may be conducted on a small network or in large cohorts naturally resulting in a tradeoff between efficiency and performance; different hospitals may collect data in different formats or from distinct demographic groups leading to severe data heterogeneities. Blindly applying FL without considering these factors will result in a waste of resources, lower performance, or even termination of collaborations.

Instead of bringing the data to the model, FL brings the model to the data—training algorithms where the data resides.

Fig 1: real-world collaborative work with three institutes using FL for COVID-19 diagnosis from chest radiographs

Preliminary Works and Our Approach

Our group is dedicated to bridging the gap between theory and practice by advancing FL for real-world deployments. In our previous collaborative work with Indiana University, Emory University, and MHealth Fairview, we developed a COVID-19 diagnosis model from chest radiographs and deployed it into real-world healthcare systems across three sites, among the first nationwide (Fig. 1) [1]. During the development process, we encountered several practical challenges, including data and system heterogeneity, resource constraints, and high communication costs. This inspired us to investigate more from a practitioner's perspective. As in our latest work [2], we studied two biomedical information extraction tasks and investigated FL under various real-world learning scenarios such as varied federation scales (Fig. 2), different model architectures (Fig. 3), data heterogeneities, and comparisons with LLMs. Our result shed light on several key practical insights:

FL is beneficial regardless of the existence of data heterogeneity or not
Large pretrained LMs are more robust against perturbation of federation scales
FL significantly outperforms LLMs with few-shot prompting

Fig 2: impact of federated scales on the performance (F-1 score) of the task of NER

Fig 3: impact of model size on the performance (F-1 score) of the task of NER

Future Perspectives and Remarks

Our study is still restricted to applications where all FL participants share the same tasks which are highly coupled to the target learning problems. A broader application of FL in healthcare could be less-coupled where different sites aim for different tasks while data is weakly correlated. This may inspire future work on improving FL with multi-task learning or training foundation models using FL. Potential research to advance these research includes accelerating FL to enable the training and fine-tuning of large foundation models and handling massive datasets. Additionally, addressing "free riding" and ensuring fairness among participants is crucial especially when resources among different participants are unequal.

References

[1] Peng, L., Luo, G., Walker, A., Zaiman, Z., Jones, E.K., Gupta, H., Kersten, K., Burns, J.L., Harle, C.A., Magoc, T. and Shickel, B. "Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals." Journal of the American Medical Informatics Association, (JAMIA), 30(1), pp.54-63.

[2] Peng, L., Luo, G., Zhou, S., Chen, J., Xu, Z., Sun, J., Zhang, R."An in-depth evaluation of federated learning on biomedical natural language processing for information extraction." npj Digital Medicine, 7.1 (2024): 127.

Multiple Contributors

Le Peng and Rui Zhang

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering

Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering

Biomedical Research

Life Sciences > Health Sciences > Biomedical Research

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Machine Learning

Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

npj Digital Medicine

npj Digital Medicine

An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Artificial Intelligence in Emergency and Critical Care Medicine

This Collection focuses on the unique challenges and opportunities for artificial intelligence (AI) applications in the emergency department (ED) and intensive care unit (ICU), environments where rapid decision-making and precision are critical to patient survival. These settings are characterized by their fast pace, high patient turnover, unpredictable workloads, and the need to manage acute and life-threatening conditions.

Publishing Model: Open Access

Deadline: Jan 10, 2026

Explore this Collection

Digital Health Equity and Access

This Collection explores innovations and challenges in advancing digital health equity and access, focusing on diverse populations and inclusive technologies.

Publishing Model: Open Access

Deadline: Mar 03, 2026

Explore this Collection

Latest Content

Opportunities, From the Editors

Call for papers: Polymerization catalysis

Opportunities, From the Editors

Call for papers: Mitosis and disease

Opportunities, From the Editors

Call for papers and Q&A: Fibromyalgia and comorbid psychiatric disorders

Opportunities, From the Editors

Call for papers: Ocean currents and climate change

Vortex-Assisted Mechanochemical Extraction for Reliable CoQ10 Analysis in Meat

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Enhancing Privacy-Preserving Learning for Biomedical Applications with Large-Scale Distributed Data

Share this post

Share with...

...or copy the link