How can we measure the evolution of health research? A data-driven approach across funding systems

We developed a scalable method to classify health research across 26,000+ projects and their publications, combining expert knowledge and machine learning. This allows us to track how funding priorities translate into research outputs across systems and over time.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

MIT Press
MIT Press MIT Press

Evolution of public funding for collaborative health research towards higher-level patient-oriented research

Abstract. Public research funding agencies increasingly seek to steer health research toward higher levels of translation and societal relevance. Yet it remains unclear to what extent such policy shifts are effectively implemented and reflected in funded projects and scientific outputs. This study examines evolution and changes in the orientation of health research portfolios since 2008 within European funding (Framework Programmes FP7 and Horizon 2020 funding for collaborative health research, FP-HR, and ERC Life Sciences grants), in comparison to NIH funding for collaborative research (P01, U01, and UM1). Using large-scale text analysis and supervised classification, we analyze both project descriptions and the associated scientific publications. At the project level, the EU FP-HR show pronounced shifts toward population-level, diagnostic, and health systems-oriented research, whereas investigator-driven ERC life sciences, NIH P01 and U01, display greater stability with a predominance of basic biomedical research. Publication-level analyses reveal more moderate changes, with basic biomedical research remaining a central component including in EU FP-HR, indicating partial translation of funding priorities into outputs. By jointly analyzing projects and publications, this study identifies and distinguishes between changes in funder expectations and realized research trajectories, highlighting how strategic funding shapes research portfolios within enduring epistemic and institutional constraints.

How can we measure what health research actually is?

Debates on research funding often rely on broad categories such as “basic” or “applied” science. But these distinctions are rarely measured in a systematic and comparable way.

In our recent study,1 we developed a methodological framework to address this challenge. By combining large-scale text analysis with supervised machine learning, we analyzed more than 26,000 funded projects and their associated scientific publications across European and U.S. funding systems.

A conceptual framework for classifying research

At the core of our approach is a classification system grounded in two dimensions:
(1) the unit of analysis of research—from molecular and cellular mechanisms to population and health systems—and

(2) the orientation of research from basic to applied.

This framework allows us to distinguish five levels of health research, ranging from basic biomedical science to health policy and management. Importantly, this is not just a keyword-based classification, but a conceptually grounded system aligned with how health research is understood in practice.

From expert knowledge to machine learning

To scale this classification to tens of thousands of projects, we used a supervised machine learning approach.

We first constructed a training set based on expert annotation. These manually classified examples were then used to train a Naïve Bayes classifier, implemented in KH Coder.

The model was iteratively refined and validated, achieving around 82% agreement with expert classifications for projects and up to 95% accuracy for publications.

This approach ensures both scalability and interpretability—two key requirements for policy-relevant analysis.

Linking projects to publications

A central innovation of the study is the integration of funding data with scientific outputs.

We linked funded projects from CORDIS and NIH RePORTER to their resulting publications.

This required addressing important differences between systems. While European projects can often be directly linked to publications, NIH data required a time-window approach due to the cumulative nature of funding and publication processes.

By combining both datasets, we were able to compare not only what funding agencies aim to support, but what research is actually produced.

A multi-layered analytical strategy

Our methodology combines three complementary components (figure):

  • Keyword-based content analysis
  • Supervised classification
  • Comparative analysis across funding mechanisms and time periods

The convergence of these approaches increases robustness and allows us to detect consistent patterns across different types of data.

Why this matters

This methodological framework moves beyond descriptive analyses of funding trends. It provides a way to empirically assess how policy priorities are translated into research activity and outputs.

More broadly, it opens new possibilities for studying how research systems evolve—and how funding shapes the direction of science (figure).

Figure. Conceptual and analytical workflow for the classification of health research.
The figure illustrates the integration of funding data and scientific publications through a common text-mining and supervised classification framework. Projects and publications are classified into five levels of research, enabling comparison between funding priorities and research outputs.

Reference

1. David Fajardo-OrtizBart ThijsWolfgang GlänzelKarin R. Sipido; Evolution of public funding for collaborative health research towards higher-level patient-oriented research. Quantitative Science Studies 2026; doi: https://doi.org/10.1162/QSS.a.472

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in