TY - JOUR
T1 - ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search
AU - McGowan, Alessia
AU - Gui, Yunlai
AU - Dobbs, Matthew
AU - Shuster, Sophia
AU - Cotter, Matthew
AU - Selloni, Alexandria
AU - Goodman, Marianne
AU - Srivastava, Agrima
AU - Cecchi, Guillermo A.
AU - Corcoran, Cheryl M.
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/8
Y1 - 2023/8
N2 - ChatGPT (Generative Pre-Trained Transformer) is a large language model (LLM), which comprises a neural network that has learned information and patterns of language use from large amounts of text on the internet. ChatGPT, introduced by OpenAI, responds to human queries in a conversational manner. Here, we aimed to assess whether ChatGPT could reliably produce accurate references to supplement the literature search process. We describe our March 2023 exchange with ChatGPT, which generated thirty-five citations, two of which were real. 12 citations were similar to actual manuscripts (e.g., titles with incorrect author lists, journals, or publication years) and the remaining 21, while plausible, were in fact a pastiche of multiple existent manuscripts. In June 2023, we re-tested ChatGPT's performance and compared it to that of Google's GPT counterpart, Bard 2.0. We investigated performance in English, as well as in Spanish and Italian. Fabrications made by LLMs, including erroneous citations, have been called “hallucinations”; we discuss reasons for which this is a misnomer. Furthermore, we describe potential explanations for citation fabrication by GPTs, as well as measures being taken to remedy this issue, including reinforcement learning. Our results underscore that output from conversational LLMs should be verified.
AB - ChatGPT (Generative Pre-Trained Transformer) is a large language model (LLM), which comprises a neural network that has learned information and patterns of language use from large amounts of text on the internet. ChatGPT, introduced by OpenAI, responds to human queries in a conversational manner. Here, we aimed to assess whether ChatGPT could reliably produce accurate references to supplement the literature search process. We describe our March 2023 exchange with ChatGPT, which generated thirty-five citations, two of which were real. 12 citations were similar to actual manuscripts (e.g., titles with incorrect author lists, journals, or publication years) and the remaining 21, while plausible, were in fact a pastiche of multiple existent manuscripts. In June 2023, we re-tested ChatGPT's performance and compared it to that of Google's GPT counterpart, Bard 2.0. We investigated performance in English, as well as in Spanish and Italian. Fabrications made by LLMs, including erroneous citations, have been called “hallucinations”; we discuss reasons for which this is a misnomer. Furthermore, we describe potential explanations for citation fabrication by GPTs, as well as measures being taken to remedy this issue, including reinforcement learning. Our results underscore that output from conversational LLMs should be verified.
KW - Artificial intelligence
KW - Bard
KW - ChatGPT
KW - Citations
KW - Fabrication
KW - Large language models
KW - Linguistic
KW - Literature search
KW - Natural language processing
KW - References
UR - http://www.scopus.com/inward/record.url?scp=85165713711&partnerID=8YFLogxK
U2 - 10.1016/j.psychres.2023.115334
DO - 10.1016/j.psychres.2023.115334
M3 - Article
AN - SCOPUS:85165713711
SN - 0165-1781
VL - 326
JO - Psychiatry Research
JF - Psychiatry Research
M1 - 115334
ER -