AI in Healthcare: Standardized Reporting for Reproducibility, Validity, and Clinical Impact

A Guide to Medical AI Reporting Guidelines
AI in Healthcare: Standardized Reporting for Reproducibility, Validity, and Clinical Impact
Like

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Artificial intelligence (AI) holds transformative potential in healthcare by enabling enhanced disease diagnosis, risk prediction, and treatment optimization. Transparent and comprehensive reporting of AI studies is critical to ensure scientific reproducibility, clinical validity, and public trust. The possible risks of flawed or incomplete reporting include:

  • Overstated performance due to technical shortcomings like data leakage or unrepresentative datasets
  • Inability to independently validate findings or assess clinical utility
  • Challenges developing and updating best practices as the field evolves
  • Lack of insight into ethical risks, bias, or limitations

The landscape of medical AI reporting guidelines

Our systematic review published in Communications Medicine indicates that there is currently no universal, high-quality reporting standard for studies applying AI to medical data and use cases. Instead, 12 of the 26 reporting guidelines included in our review address specific medical fields (i.e., the CLEAR Guideline regulating Medical Imaging research), and 20 of the 26 reporting guidelines target preclinical or translational research rather than studies prospectively reporting the clinical evaluation of AI-based models. Additionally, the rigor of the development processes ranged from comprehensive multi-stakeholder consensus approaches to more expert-led efforts without clear stakeholder involvement. 

Guidelines for early-phase preclinical work more often had narrow subspecialty focuses and were developed without comprehensive consensus procedures compared to the smaller number of clinical trial guidelines. The lack of universal, high-quality guidelines observed in this review may contribute to findings that only a fraction of published AI studies in medicine fully adhere to reporting best practices.

Figure 1: The landscape of medical AI reporting guidelines. Comprehensive guidelines are based on a structured, consensus-based, methodical development approach involving multiple experts and relevant stakeholders with details on the exact protocol. Collaborative guidelines are (presumably) developed using a formal consensus procedure involving multiple experts, but provide no details on the exact protocol or methodological structure. Expert-led guidelines are not developed through a consensus-based procedure, do not involve relevant stakeholders, or do not clearly describe the development procedure.

Universal Recommendations Across Guidelines 

Despite the heterogeneity, several guideline items were consistently recommended across at least 50% of all guidelines or those developed with rigorous consensus processes. These "universal" components included details like:

  • Clearly describing the clinical prediction problem and rationale
  • Specifying the data sources, types, and preprocessing steps
  • Detailing the type of predictive model and its training/validation
  • Reporting model performance metrics and interpretation
  • Discussing limitations, clinical implications, and avenues for real-world translation

These universal items could serve as a baseline for responsible reporting of predictive clinical AI studies in cases where no high-quality guidelines exist for a specific use case.

Figure 2: Universal components of studies on predictive clinical AI models. Items recommended by at least 50% of all guidelines or 50% of guidelines with a specified systematic development process were considered universal components of studies on predictive clinical AI models.

Towards Robust, Adaptive Guidelines 

As the clinical AI ecosystem rapidly evolves, guidelines must remain dynamic to adequately regulate new data types, model architectures, and use cases. Looking ahead, a challenge in future guideline development will be to balance the need to continuously update guidelines with the resources and time needed to conduct consensus procedures involving key stakeholders. Ensuring high reporting standards through appropriately rigorous, living guidelines will be crucial for maintaining scientific integrity and translating AI's potential into clinically validated tools that improve patient outcomes. Researchers, journals, medical societies, and regulatory bodies all have a role to play in aligning on and enforcing such standards as AI applications progress towards real-world implementation.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Methodology of Data Collection and Processing
Mathematics and Computing > Statistics > Methodology of Data Collection and Processing
Health, Medicine and Society
Humanities and Social Sciences > Society > Sociology > Health, Medicine and Society
Innovation and Medicine
Humanities and Social Sciences > Society > Sociology > Health, Medicine and Society > Innovation and Medicine
Predictive Medicine
Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology > Predictive Medicine
Literature, Science and Medicine Studies
Humanities and Social Sciences > Literature > Literary Theory > Comparative Literature > Literature and Cultural Studies > Literature, Science and Medicine Studies

Related Collections

With collections, you can get published faster and increase your visibility.

Liquid biopsy

Publishing Model: Open Access

Deadline: Aug 13, 2024