Behind the Paper

Domain-Specific NLP Rivals 70 Billion Parameter LLM Giants in Clinical AI

In our latest research, we set out to accurately classify Crohn’s disease (CD) in radiology reports using natural language processing (NLP).

📊 We evaluated a broad spectrum of NLP techniques—from rule-based models to rationale extraction, classic deep learning (CNN, Bi-LSTM) and LLMs.

🧠Meet IBDBERT 🤖 — our custom 110M-parameter transformer model finetuned on inflammatory bowel disease-specific textbooks and clinical guidelines (American Gastroenterological Association (AGA), American College of Gastroenterology, ECCO European Crohn' and Colitis Organisation). Despite being 600x smaller than today’s behemoths like Meta’s LLaMA 3.3-70B or DeepSeek-R1, IBDBERT achieved near or better than state-of-the-art performance.

💥The takeaway?
🧠Bigger isn’t always better. Domain-specific, efficient models can match or outperform billion-parameter LLMs—especially in specialized clinical settings.
📱Lightweight models like IBDBERT are well-suited for mobile or resource-limited settings, offering practical and scalable solutions without compromising accuracy, privacy, and energy efficiency.
🔍 Error analysis revealed that all models, including LLMs, overly relied on prior diagnosis sections rather than actual imaging findings—underscoring the need for explainable AI (XAI) in medicine.

📕Full paper
🔗https://www.nature.com/articles/s41746-025-01729-5

📖 Crohn’s Disease Background
🔗https://www.thelancet.com/article/S0140-6736(12)60026-9/fulltext
🔗https://doi.org/10.1007/978-3-319-33703-6

⚕️Crohn's Disease Management with Targeted Therapies
🔗https://www.nejm.org/doi/full/10.1056/NEJMra1907607