Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations

This study presents an approach that removes the need of medical experts to build assisting tools for medical experts, opening the way for the exploitation of immense sources of data collected worldwide.
Published in Healthcare & Nursing
Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations
Like

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Digitization of healthcare data is changing the organization and the structure of digital workflows. Every year, an impressive amount of new healthcare data (including images, reports, etc) is produced worldwide. For example, only in 2020 over 50 petabytes (1 petabyte is 1000 terabytes) of data were produced worldwide. The large quantity of data helps medical experts to make diagnosis and to decide the best treatment to tackle diseases. However, the number of people dying because of cancer is still high. For example, about 10 million people died because of cancer in the world.
One of the reasons behind these deaths is the fact that the possibility of analyzing all the data collected is limited. The increasing amount of healthcare data does not scale with the limited number of medical experts available to analyze the data. The analysis of healthcare data requires time and expertise. The role of medical experts is not trivial: they analyze complex structures, such as tissue morphologies, to identify peculiar characteristics of diseases, such as cancer, in a context where the heterogeneity of structures is very high. This process can last around one hour per sample, saturating the capacity of hospitals to absorb the mole of new data and stretching the time needed to diagnose a disease and to define a treatment.
Recently, new technologies have developed to support medical experts in the analysis of healthcare data. In particular, deep learning algorithms emerge as the state-of-the-art algorithm for several tasks, such as image classifications. However, this technology mostly relies on annotated data. Deep learning methods are data-driven, therefore the performance and robustness of deep learning models mostly depend on the data. This requirement limits the development of algorithms to automatically analyze healthcare data since the data annotation process involves medical experts to analyze healthcare data. Therefore, only a small percentage of healthcare data are annotated by experts, limiting the potential that large amounts of data and data-driven algorithms may have on the medical domain. 
This study presents an approach aiming to exploit the potential of healthcare data from hospital workflows. The approach is designed to remove the need for human experts to annotate data: the reports corresponding to healthcare data (that can be CT scans, MRI, whole slide images) are automatically analyzed to create automatic annotations that can be used to train deep learning models. 
The method is tested on the classification of colon whole slide images, comparing the performance of two deep learning models with the same architecture, one trained using automatic made annotations and the other one using annotations made by medical experts.
The performance of the models is comparable, showing that automatic annotations generated from reports are as meaningful as the ones made by medical experts. Therefore, automatically made annotations can be used for training tools to assist medical experts.
The method is independent of the tissue and the type of images analyzed. Even if we selected colon histopathology data as use case study, it can be easily applied to data coming from different organs (lungs, prostate, head) and, most importantly, on different types of images, including CT-scans, MRIs, etc. This last point is the most important to stress since it allows to influence several healthcare domains, starting a virtuous cycle in the whole medical domain.
The implication of the results is disruptive: the need for medical experts for healthcare data analysis and annotation (until now a bottleneck) may be removed, leading to the possibility of exploiting data already available in the hospital informative systems all around the world. 
Therefore, petabytes of data are potentially available to build robust models, since they only need to be extracted from hospital databases and do not need any further analysis from experts. The exploitation of data collected from several hospitals may allow to create robust models, more accurate on predictions and less affected by data heterogeneity.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Health Care
Life Sciences > Health Sciences > Health Care
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Harnessing digital health technologies to tackle climate change and promote human health

This collection invites research on the use of digital health technologies that innovate solutions to improve sustainable health care practice and delivery.

Publishing Model: Open Access

Deadline: Apr 30, 2024