TY - JOUR
T1 - An active inference strategy for prompting reliable responses from large language models in medical practice
AU - Shusterman, Roma
AU - Waters, Allison C.
AU - O’Neill, Shannon
AU - Bangs, Marshall
AU - Luu, Phan
AU - Tucker, Don M.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Continuing advances in Large Language Models (LLMs) are transforming medical knowledge access across education, training, and treatment. Early literature cautions their non-determinism, potential for harmful responses, and lack of quality control. To address these issues, we propose a domain-specific, validated dataset for LLM training and an actor–critic prompting protocol grounded in active inference. A Therapist agent generates initial responses to patient queries, while a Supervisor agent refines them. In a blind validation study, experienced cognitive behavior therapy for insomnia (CBT-I) therapists evaluated 100 patient queries. For each query, they were given either the LLM’s response or one of two therapist-crafted responses—one appropriate and one deliberately inappropriate—and asked to rate the quality and accuracy of each reply. The LLM often received higher ratings than the appropriate responses, indicating effective alignment with expert standards. This structured approach lays the foundation for safely integrating advanced LLM technology into medical applications.
AB - Continuing advances in Large Language Models (LLMs) are transforming medical knowledge access across education, training, and treatment. Early literature cautions their non-determinism, potential for harmful responses, and lack of quality control. To address these issues, we propose a domain-specific, validated dataset for LLM training and an actor–critic prompting protocol grounded in active inference. A Therapist agent generates initial responses to patient queries, while a Supervisor agent refines them. In a blind validation study, experienced cognitive behavior therapy for insomnia (CBT-I) therapists evaluated 100 patient queries. For each query, they were given either the LLM’s response or one of two therapist-crafted responses—one appropriate and one deliberately inappropriate—and asked to rate the quality and accuracy of each reply. The LLM often received higher ratings than the appropriate responses, indicating effective alignment with expert standards. This structured approach lays the foundation for safely integrating advanced LLM technology into medical applications.
UR - https://www.scopus.com/pages/publications/85218416748
U2 - 10.1038/s41746-025-01516-2
DO - 10.1038/s41746-025-01516-2
M3 - Article
AN - SCOPUS:85218416748
SN - 2398-6352
VL - 8
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 119
ER -