Abstract
Background and Aims: Clinical guidelines for inflammatory bowel disease (IBD) are essential for standardizing care, but synthesizing recommendations from multiple, often conflicting guidelines is a laborious task for clinicians. We developed and evaluated a proof-of-concept tool using a large language model (LLM) with retrieval-augmented generation (RAG) to help clinicians navigate this complexity by harmonizing guidelines, identifying consensus and controversy, and generating actionable statements. Methods: An LLM-driven RAG pipeline (GPT-4o) was designed to segment guideline content and compare recommendations across four international guidelines (ACG, ECCO, BSG, ACPGBI). This tool was evaluated on eight common clinical questions in Crohn's disease and ulcerative colitis. Outputs were assessed against expert-generated references by four independent reviewers using five-point Likert scales for completeness, accuracy, relevance, coherence and conciseness. Results: The tool reliably identified similarities and differences across guidelines, with mean scores of 4.34 (95% CI, 4.20–4.48) for consensus recognition and 4.61 (95% CI, 4.46–4.77) for disagreement detection. Completeness, accuracy and relevance consistently scored >4.0, while conciseness was lower (3.84, 95% CI, 3.50–4.19). Outline generation performance was moderate (3.25, 95% CI, 2.85–3.65). In 87.5% of cases, tool-generated recommendations aligned with expert conclusions. Conclusions: This proof-of-concept study demonstrates that an LLM-RAG framework can systematically integrate IBD guidelines with high fidelity. This approach has the potential to improve guideline usability and support decision-making at the point of care, though further refinement is needed for conciseness and comprehensive outline generation.
| Original language | English |
|---|---|
| Article number | e70436 |
| Journal | Colorectal Disease |
| Volume | 28 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2026 |
Keywords
- Crohn's disease
- guidelines
- inflammatory bowel disease
- large language models
- retrieval-augmented generation
- ulcerative colitis
Fingerprint
Dive into the research topics of 'Using large language models to integrate international IBD guidelines: A retrieval-augmented generation approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver