Behind the Paper

The Crying Wolf Effect in AI-Assisted Colonoscopy

Colorectal cancer (CRC) remains one of the leading causes of cancer-related deaths globally. Despite the effectiveness of colonoscopy as the gold standard for prevention, approximately one in four adenomas is missed, potentially leading to interval cancers. This underscores the need for advanced technologies, such as computer-aided detection (CADe) systems, to enhance the accuracy of colonoscopy.

In our recent study, we investigated the impact of false-positive (FP) rates on the clinical performance of two CADe systems in real-world settings. The study involved over 3,000 patients undergoing colonoscopy and compared the performance of System A (FP rate: 3.2%) and System B (FP rate: 0.6%). While both systems demonstrated high sensitivity, significant differences emerged in their effectiveness.

System B, with its lower FP rate, significantly improved key outcomes, including the adenoma detection rate (ADR) and adenomas per colonoscopy (APC). It achieved a 50.4% ADR compared to 44.3% in standard colonoscopy and 43.4% in System A. Additionally, System B maintained a lower rate of unnecessary resections, alleviating concerns about over-intervention often associated with FPs.

 Interestingly, our findings highlighted the phenomenon of "alarm fatigue," where frequent false alarms from high-FP systems like System A led to reduced attention to critical alerts. This "crying wolf effect" not only diminishes the efficiency of CADe systems but may also hinder their adoption in clinical practice. High-performing endoscopists showed no ADR improvement with System A, suggesting that excessive FPs could impair their established routines.

 Our research emphasizes the delicate balance between sensitivity and precision in developing CADe systems. Reducing FP rates is crucial for optimizing clinical outcomes and ensuring the practical integration of AI in routine care. By addressing these challenges, we can move closer to a future where AI enhances—not hinders—medical practice.

 This study provides critical insights for clinicians, researchers, and developers working to refine AI technologies for healthcare. We believe these findings will stimulate further innovations in CADe systems, paving the way for their widespread adoption and ultimately improving patient outcomes in CRC prevention.

 For more details on our findings, explore the full paper linked here.

https://doi.org/10.1038/s41746-024-01334-y