Behind the Paper

Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations

This study presents an approach that removes the need of medical experts to build assisting tools for medical experts, opening the way for the exploitation of immense sources of data collected worldwide.

Published in Healthcare & Nursing

Aug 24, 2022

Niccolò Marini

PhD student, HES-SO Valais

Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations

Liked by Manfredo Atzori and 2 others

Explore the Research

Digitization of healthcare data is changing the organization and the structure of digital workflows. Every year, an impressive amount of new healthcare data (including images, reports, etc) is produced worldwide. For example, only in 2020 over 50 petabytes (1 petabyte is 1000 terabytes) of data were produced worldwide. The large quantity of data helps medical experts to make diagnosis and to decide the best treatment to tackle diseases. However, the number of people dying because of cancer is still high. For example, about 10 million people died because of cancer in the world.
One of the reasons behind these deaths is the fact that the possibility of analyzing all the data collected is limited. The increasing amount of healthcare data does not scale with the limited number of medical experts available to analyze the data. The analysis of healthcare data requires time and expertise. The role of medical experts is not trivial: they analyze complex structures, such as tissue morphologies, to identify peculiar characteristics of diseases, such as cancer, in a context where the heterogeneity of structures is very high. This process can last around one hour per sample, saturating the capacity of hospitals to absorb the mole of new data and stretching the time needed to diagnose a disease and to define a treatment.
Recently, new technologies have developed to support medical experts in the analysis of healthcare data. In particular, deep learning algorithms emerge as the state-of-the-art algorithm for several tasks, such as image classifications. However, this technology mostly relies on annotated data. Deep learning methods are data-driven, therefore the performance and robustness of deep learning models mostly depend on the data. This requirement limits the development of algorithms to automatically analyze healthcare data since the data annotation process involves medical experts to analyze healthcare data. Therefore, only a small percentage of healthcare data are annotated by experts, limiting the potential that large amounts of data and data-driven algorithms may have on the medical domain.
This study presents an approach aiming to exploit the potential of healthcare data from hospital workflows. The approach is designed to remove the need for human experts to annotate data: the reports corresponding to healthcare data (that can be CT scans, MRI, whole slide images) are automatically analyzed to create automatic annotations that can be used to train deep learning models.
The method is tested on the classification of colon whole slide images, comparing the performance of two deep learning models with the same architecture, one trained using automatic made annotations and the other one using annotations made by medical experts.
The performance of the models is comparable, showing that automatic annotations generated from reports are as meaningful as the ones made by medical experts. Therefore, automatically made annotations can be used for training tools to assist medical experts.
The method is independent of the tissue and the type of images analyzed. Even if we selected colon histopathology data as use case study, it can be easily applied to data coming from different organs (lungs, prostate, head) and, most importantly, on different types of images, including CT-scans, MRIs, etc. This last point is the most important to stress since it allows to influence several healthcare domains, starting a virtuous cycle in the whole medical domain.
The implication of the results is disruptive: the need for medical experts for healthcare data analysis and annotation (until now a bottleneck) may be removed, leading to the possibility of exploiting data already available in the hospital informative systems all around the world.
Therefore, petabytes of data are potentially available to build robust models, since they only need to be extracted from hospital databases and do not need any further analysis from experts. The exploitation of data collected from several hospitals may allow to create robust models, more accurate on predictions and less affected by data heterogeneity.

Niccolò Marini

PhD student, HES-SO Valais

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Care

Life Sciences > Health Sciences > Health Care

npj Digital Medicine

npj Digital Medicine

An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Synthetic Clinical Data and Privacy-Preserving Frameworks for Trustworthy Health AI

This Collection explores synthetic clinical data and privacy-preserving AI for trustworthy health AI, with required public deposition of synthetic datasets.

Publishing Model: Open Access

Deadline: Jun 03, 2027

Explore this Collection

Digital Biomarkers for Enabling Proactive Clinical Decisions

This collection invites submissions focused on novel digital health technologies and advanced modeling approaches that enable precision medical decision-making with digital biomarkers.

Publishing Model: Open Access

Deadline: May 18, 2027

Explore this Collection

Latest Content

Behind the Paper

Unlocking the Synergistic Promoter Role of Phosphorus in Evolving NiFe Phosphides for Enhanced Water Oxidation

News and Opinion

The Governance Layer: Turning Imaging, Pathology, and Free Text into One Computable Language

From the Editors

Discover's 5th Anniversary: Discover journals shine in the 2025 indexing metrics

Behind the Paper

What Hawaiian Seaweeds Can Teach Us About Speciation

Behind the Paper

Beyond drinking water: What else can water from air do for global health?

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations

Share this post

Share with...

...or copy the link