Behind the Paper

Med-BERT: Pre-trained Embedding for Structured EHR

Behind the paper: Rasmy et al: Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Published in Healthcare & Nursing

May 20, 2021

Degui Zhi

Associate Professor , School of Biomedical Informatics U Texas Health Science Center at Houston

Like Be the first to like this

Explore the Research

Frankly, structured electronic health records (EHRs) are not the favorite data modality for deep learning. At least not yet.

One of the reasons is that, unlike images and natural languages where large training data are freely available, very large collection of EHRs are not accessible for many. This will hamper the performance of predictive modeling for individual hospitals, who often only can access their own samples, which are small for deep learning standard.

Our work addresses this issue by adapting the popular NLP framework, BERT, for EHRs. The BERT is a masked autoencoder transformer model that is pre-trained using a very large data set. While the pre-training is difficult and expensive, the pre-trained model can be fine-tuned to deliver state-of-the-art results for a variety of natural language tasks, with even a smaller data set and relatively easy and cheap computation. However, the BERT pre-training-fine-tuning methodology has not been convincingly demonstrated at structured EHR data yet.

In this work, we developed Med-BERT, BERT model for structured EHR data. We pre-trained a 17 million parameter (not large for NLP standard, but quite big for structured EHR) transformer model using a 28 million patient data set. We showed Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% for the tasks we tested. In particular, pre-trained Med-BERT boosts performances for small fine-tuning training sets, equivalent to boosting the training set for about 10 times.

Predicting risk of heart failure in diabetes patients in a Cerner data set.

Overall Med-BERT proves the concept that the BERT methodology is applicable to structured EHR data. Sharing of pre-trained Med-BERT models will benefit disease-prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare. Our work is available at doi: 10.1038/s41746-021-00455-y.

Degui Zhi

Associate Professor , School of Biomedical Informatics U Texas Health Science Center at Houston

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Care

Life Sciences > Health Sciences > Health Care

npj Digital Medicine

npj Digital Medicine

An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Artificial Intelligence in Emergency and Critical Care Medicine

This Collection focuses on the unique challenges and opportunities for artificial intelligence (AI) applications in the emergency department (ED) and intensive care unit (ICU), environments where rapid decision-making and precision are critical to patient survival. These settings are characterized by their fast pace, high patient turnover, unpredictable workloads, and the need to manage acute and life-threatening conditions.

Publishing Model: Open Access

Deadline: Jan 10, 2026

Explore this Collection

Evaluating the Real-World Clinical Performance of AI

This Collection invites research on exploring how AI performs in real-world clinical settings, focusing on utility, safety, equity, and impact on healthcare.

Publishing Model: Open Access

Deadline: Jun 03, 2026

Explore this Collection

Acceleration of diverging runoff trends on the Third Pole

Behind the Paper

Spatial and Contextual Disparities of Influential Factors of Adult Obesity among Communities in Chicago

Behind the Paper

Engineering oat protein nanostructure to combat iron deficiency anemia

Behind the Paper

When Trees Breathe Methane: Revealing a Hidden Pathway in Mangrove Carbon Cycling

Behind the Paper

Deconstructing the Hypothetical Genomic Weapon: Navigating the Dual-Use Dilemma of Nanoplatforms

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Med-BERT: Pre-trained Embedding for Structured EHR

Share this post

Share with...

...or copy the link