Artificial intelligence (AI) holds transformative potential in healthcare by enabling enhanced disease diagnosis, risk prediction, and treatment optimization. Transparent and comprehensive reporting of AI studies is critical to ensure scientific reproducibility, clinical validity, and public trust. The possible risks of flawed or incomplete reporting include:
- Overstated performance due to technical shortcomings like data leakage or unrepresentative datasets
- Inability to independently validate findings or assess clinical utility
- Challenges developing and updating best practices as the field evolves
- Lack of insight into ethical risks, bias, or limitations
The landscape of medical AI reporting guidelines
Our systematic review published in Communications Medicine indicates that there is currently no universal, high-quality reporting standard for studies applying AI to medical data and use cases. Instead, 12 of the 26 reporting guidelines included in our review address specific medical fields (i.e., the CLEAR Guideline regulating Medical Imaging research), and 20 of the 26 reporting guidelines target preclinical or translational research rather than studies prospectively reporting the clinical evaluation of AI-based models. Additionally, the rigor of the development processes ranged from comprehensive multi-stakeholder consensus approaches to more expert-led efforts without clear stakeholder involvement.
Guidelines for early-phase preclinical work more often had narrow subspecialty focuses and were developed without comprehensive consensus procedures compared to the smaller number of clinical trial guidelines. The lack of universal, high-quality guidelines observed in this review may contribute to findings that only a fraction of published AI studies in medicine fully adhere to reporting best practices.
Universal Recommendations Across Guidelines
Despite the heterogeneity, several guideline items were consistently recommended across at least 50% of all guidelines or those developed with rigorous consensus processes. These "universal" components included details like:
- Clearly describing the clinical prediction problem and rationale
- Specifying the data sources, types, and preprocessing steps
- Detailing the type of predictive model and its training/validation
- Reporting model performance metrics and interpretation
- Discussing limitations, clinical implications, and avenues for real-world translation
These universal items could serve as a baseline for responsible reporting of predictive clinical AI studies in cases where no high-quality guidelines exist for a specific use case.
Towards Robust, Adaptive Guidelines
As the clinical AI ecosystem rapidly evolves, guidelines must remain dynamic to adequately regulate new data types, model architectures, and use cases. Looking ahead, a challenge in future guideline development will be to balance the need to continuously update guidelines with the resources and time needed to conduct consensus procedures involving key stakeholders. Ensuring high reporting standards through appropriately rigorous, living guidelines will be crucial for maintaining scientific integrity and translating AI's potential into clinically validated tools that improve patient outcomes. Researchers, journals, medical societies, and regulatory bodies all have a role to play in aligning on and enforcing such standards as AI applications progress towards real-world implementation.
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in