Nationwide Multi-Institutional Data Harvesting for Medical Imaging Utilization and Reporting: Early-Stage COVID-19 Pandemic Observations on Pulmonary Embolism

This study aimed to demonstrate the feasibility of a novel multi-institutional data harvesting (MIDH) method, using existing AI-based infrastructure to assess the utilization of imaging and observed positivity rates in pulmonary embolism, a known complication of COVID-19 infection.
Published in Healthcare & Nursing
Nationwide Multi-Institutional Data Harvesting for Medical Imaging Utilization and Reporting: Early-Stage COVID-19 Pandemic Observations on Pulmonary Embolism

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

COVID-19 has had a profound impact on medical imaging, but measuring that impact quantitatively has proved challenging. Many studies simply looked at how often studies were performed before and after the pandemic started, but this type of research does not consider if there is a change in the results of the imaging.  Trends in imaging results and imaging usage are not commonly evaluated in the context of public health.

Establishing trends related to computed tomography pulmonary angiography (CTPA) during the COVID-19 pandemic could be meaningful, as blood clots turned out to be an important complication of the infection. CTPA is commonly used to evaluate the pulmonary arteries for the presence of pulmonary embolism (PE), which is a life-threatening condition caused by blood clots in the pulmonary arteries, impeding the uptake of oxygen into the blood stream in the lungs.  It was not until June 2020 that a growing body of evidence indicated that the infection was a significant risk factor for developing PE1 and, subsequently, expert guidelines were published2. Due to changes in overall healthcare utilization, and evolving data on the association of thromboembolic events in the setting of COVID-19, it was not clear whether changes in the prevalence of PE were due to changes in testing frequency or a change in the incidence of the condition.

When looking for patterns in data over time, studies from single institutions face the challenge of small cohorts resulting in low statistical power. To address this challenge, we introduce a multi-institutional data harvesting (MIDH) approach as a method for establishing important disease-related trends over time. This novel approach to looking at large amounts of data across different health systems is now possible based on the widespread availability of cloud-based algorithms that process medical imaging data in near real time. By examining the data being sent to such cloud-based image analysis services from multiple hospital systems, the MIDH approach can track both imaging utilization and imaging results as documented in radiology report findings. This data harvesting approach can add value to the common use of these AI services, which are often used to screen medical imaging exams, to expedite the radiology workflow, and to improve radiologists’ diagnostic accuracy3, 4. Our overall goal for conducting this work was to demonstrate the feasibility of the proposed MIDH approach by investigating the effect of the COVID-19 pandemic on both the number of CTPA tests performed, and to track the observed prevalence of PE among these tests.  Working with a software vendor and with multiple radiologists across the country, we performed a retrospective study encompassing aggregated data from 13 healthcare systems.  

To measure both the test volumes and number of positive tests, we combined repurposed software developed for AI-based workflow orchestration and natural language processing (NLP) (Aidoc, Tel Aviv, Israel) to access and collect data from each participating site.  The approved use of the installed application is to perform AI-based reprioritization of radiologists’ worklists with the goal to decrease study turn-around-time, thereby expediting treatment in critical clinical conditions, such as intracranial hemorrhage or pulmonary embolism3, 5, 6. Interestingly, from a healthcare data perspective, these types of systems aggregate useful data beyond just the AI output that is sent back to the users.  For this study, we did not use any AI-based image analysis; we only used the underlying data workflow prioritization software to automatically retrieve CTPA cases within the participating healthcare systems, which we predicted would give a good overview of utilization trends. Furthermore, the radiology reports were automatically classified for the presence or absence of PE by using NLP.  While other technical methods for retrieving CTPA datasets, such as based on electronic medical records, radiology information systems, or Picture Archiving and Communication Systems (PACS), might have been considered from a technical viewpoint, our approach using repurposed AI-image-analysis orchestration software provided the advantage of being a common system, already deployed in clinical routine, with a validated and robust study identification mechanism, which provided easy access and consistent data collection across multiple institutions with different individual technical infrastructures.

We identified two 70-day observational periods to examine:

  • the pre-pandemic period from 11/25/2019 through 2/2/2020, and
  • the early COVID-19 pandemic period from 3/8/2020 through 5/16/2020.

A total of 21,912 CTPA studies were performed within these two periods: 12,106 cases within a pre-COVID-19 observation period and 9,806 cases during the early pandemic outbreak period. The median age was 59 years (interquartile range (IQR) 45-70 years) and 56% were female. Overall, 58% of patients were imaged in emergency departments, 28% were inpatients, and 10% were outpatients.

We found that during the early phase of the pandemic, fewer tests for PE were performed, but PE incidence increased, and therefore the rate of positive studies also increased.  Specifically, fewer CTPA exams were performed during the early COVID-19 pandemic than during the pre-pandemic period (9,806 vs. 12,106). However, during the pre-COVID-19 period, 1,200/12,106 (9.9%) CTPA cases were positive for PE, while 1,138/9,806 (11.6%) were positive for PE during the early COVID-19 outbreak. There is a statistically significant association between the ratio of PE-positive CTPA studies (“PE positivity” rate) and the observational period (χ2(1,N=21,912) = 16.29, p<0.0001). Note that, for the 70-day early pandemic observational period, we observed an excess of 92 positive PE cases, or 1.3 additional PE cases per day more than statistically expected. In summary, when compared to the pre-pandemic period, there was an overall decrease in CTPA examinations performed with a simultaneous increase of the PE positivity rate (please see Figure).


Changes in CTPA utilization and PE positivity rate over time. Superimposed results demonstrate a drop in the average of weekly total CTPA exams performed among all the participating institutions (orange curve) with a simultaneous increase in PE positivity rates (blue curve).


Natural language processing on final radiology reports served as the ground truth for identifying positive PE cases. To support the use of this method, we performed a multi-institutional NLP validation trial at 12 of the 13 participating healthcare systems. 1200 PE+ and 1200 PE- by NLP radiology reports were manually reviewed, with an overall accuracy was 98%.

Our study points to several possible explanations for the increased PE prevalence during the early COVID-19 period in many of our institutions. First, our findings may reflect a true increase in case prevalence of PE caused by COVID-19 infection, which is known to induce a prothrombotic state and increases the risk of embolism, both pulmonary and systemic7-10. Many case reports suggest that the prothrombotic state associated with COVID-19 is even seen in subclinical infections11, 12, so the presence of the SARS-CoV-2 virus in a population may be enough to raise the risk of PE even in relatively healthy outpatients. Another possible explanation for our findings is that the subset of patients that were seeking medical care during the pandemic were in an overall more severe clinical condition with a higher pre-test probability and therefore had a higher ratio of patients with PE.

By tracking both large-scale utilization and clinical imaging results data, this study shows the MIDH approach can be used to establish surrogates for measuring important disease-related observational quantities over time. The increasing use of cloud-based healthcare data processing applications will continue to improve the quantity and quality of the data available and can allow radiology results to be used to directly measure outcomes that are important to public health.

Our retrospective multicenter study clearly documents an increase in the observed prevalence of PE on CTPA examinations during the early pandemic phase, despite an overall decrease in the number of acquired CTPA examinations. Had this system been actively monitored in real time, the trends of increasing rate of PE could have been recognized as a life-threatening complication of COVID-19 before it was documented in the medical literature. It may be speculated whether MIDH-based “real-time” longitudinal data monitoring may therefore be useful for clinically meaningful decision making when facing future healthcare challenges with significant uncertainties, such as future pandemics.



  1. Poissy J, Goutay J, Caplan M, et al. Pulmonary Embolism in Patients With COVID-19: Awareness of an Increased Prevalence. Circulation. 2020;142:184-186.
  2. Moores LK, Tritschler T, Brosnahan S, et al. Prevention, Diagnosis, and Treatment of VTE in Patients With Coronavirus Disease 2019: CHEST Guideline and Expert Panel Report. Chest. 2020;158:1143-1163.
  3. Weikert T, Winkel DJ, Bremerich J, et al. Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm. Eur Radiol. 2020;30:6545-6553.
  4. O’Neill TJ, Xi Y, Stehel E, et al. Active Reprioritization of the Reading Worklist Using Artificial Intelligence Has a Beneficial Effect on the Turnaround Time for Interpretation of Head CTs with Intracranial Hemorrhage. Radiology: Artificial Intelligence.0:e200024.
  5. Wismüller A, Stockmaster L. A prospective randomized clinical trial for measuring radiology study reporting time on Artificial Intelligence-based detection of intracranial hemorrhage in emergent care head CT. Medical Imaging 2020: Biomedical Applications in Molecular, Structural, and Functional Imaging. Vol 11317: International Society for Optics and Photonics; 2020:113170M.
  6. O’Neill TJ, Xi Y, Stehel E, et al. Active Reprioritization of the Reading Worklist Using Artificial Intelligence Has a Beneficial Effect on the Turnaround Time for Interpretation of Head CTs with Intracranial Hemorrhage. Radiology: Artificial Intelligence. 2020:e200024.
  7. Liao SC, Shao SC, Chen YT, Chen YC, Hung MJ. Incidence and mortality of pulmonary embolism in COVID-19: a systematic review and meta-analysis. Crit Care. 2020;24:464.
  8. Fontana P, Casini A, Robert-Ebadi H, Glauser F, Righini M, Blondon M. Venous thromboembolism in COVID-19: systematic review of reported risks and current guidelines. Swiss Med Wkly. 2020;150:w20301.
  9. Kaptein FHJ, Stals MAM, Grootenboers M, et al. Incidence of thrombotic complications and overall survival in hospitalized patients with COVID-19 in the second and first wave. Thromb Res. 2021;199:143-148.
  10. Jiménez D, García-Sanchez A, Rali P, et al. Incidence of VTE and Bleeding Among Hospitalized Patients With Coronavirus Disease 2019: A Systematic Review and Meta-analysis. Chest. 2021;159:1182-1196.
  11. Delcros Q, Rohmer J, Tcherakian C, Groh M. Extensive DVT and Pulmonary Embolism Leading to the Diagnosis of Coronavirus Disease 2019 in the Absence of Severe Acute Respiratory Syndrome Coronavirus 2 Pneumonia. Chest. 2020;158:e269-e271.
  12. Karolyi M, Pawelka E, Omid S, et al. Late onset pulmonary embolism in young male otherwise healthy COVID-19 patients. Eur J Clin Microbiol Infect Dis. 2021;40:633-635.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Care
Life Sciences > Health Sciences > Health Care
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Digital twins for precision health

Publishing Model: Open Access

Deadline: Aug 31, 2024

Natural language processing in Clinical Medicine

This Collection welcomes research on Natural Language Processing innovations to improving medical and population health outcomes, with a particular emphasis on computational linguistics approaches and applications for health and digital medicine.

Publishing Model: Open Access

Deadline: Sep 27, 2024