Skip to main navigation Skip to search Skip to main content

Using large language models to integrate international IBD guidelines: A retrieval-augmented generation approach

  • Aryan Salahi-Niri
  • , Seyed Amir Ahmad Safavi-Naini
  • , Jalpa Devi
  • , Nariman Naderi
  • , Shaji Sebastian
  • , Michel Adamina
  • , Girish Nadkarni
  • , Ali Soroush
  • , Alaa El-Hussuna

Research output: Contribution to journalArticlepeer-review

Abstract

Background and Aims: Clinical guidelines for inflammatory bowel disease (IBD) are essential for standardizing care, but synthesizing recommendations from multiple, often conflicting guidelines is a laborious task for clinicians. We developed and evaluated a proof-of-concept tool using a large language model (LLM) with retrieval-augmented generation (RAG) to help clinicians navigate this complexity by harmonizing guidelines, identifying consensus and controversy, and generating actionable statements. Methods: An LLM-driven RAG pipeline (GPT-4o) was designed to segment guideline content and compare recommendations across four international guidelines (ACG, ECCO, BSG, ACPGBI). This tool was evaluated on eight common clinical questions in Crohn's disease and ulcerative colitis. Outputs were assessed against expert-generated references by four independent reviewers using five-point Likert scales for completeness, accuracy, relevance, coherence and conciseness. Results: The tool reliably identified similarities and differences across guidelines, with mean scores of 4.34 (95% CI, 4.20–4.48) for consensus recognition and 4.61 (95% CI, 4.46–4.77) for disagreement detection. Completeness, accuracy and relevance consistently scored >4.0, while conciseness was lower (3.84, 95% CI, 3.50–4.19). Outline generation performance was moderate (3.25, 95% CI, 2.85–3.65). In 87.5% of cases, tool-generated recommendations aligned with expert conclusions. Conclusions: This proof-of-concept study demonstrates that an LLM-RAG framework can systematically integrate IBD guidelines with high fidelity. This approach has the potential to improve guideline usability and support decision-making at the point of care, though further refinement is needed for conciseness and comprehensive outline generation.

Original languageEnglish
Article numbere70436
JournalColorectal Disease
Volume28
Issue number4
DOIs
StatePublished - Apr 2026

Keywords

  • Crohn's disease
  • guidelines
  • inflammatory bowel disease
  • large language models
  • retrieval-augmented generation
  • ulcerative colitis

Fingerprint

Dive into the research topics of 'Using large language models to integrate international IBD guidelines: A retrieval-augmented generation approach'. Together they form a unique fingerprint.

Cite this