TY - JOUR
T1 - Extracting social determinants of health from electronic health records using natural language processing
T2 - A systematic review
AU - Patra, Braja G.
AU - Sharma, Mohit M.
AU - Vekaria, Veer
AU - Adekkanattu, Prakash
AU - Patterson, Olga V.
AU - Glicksberg, Benjamin
AU - Lepow, Lauren A.
AU - Ryu, Euijung
AU - Biernacka, Joanna M.
AU - Furmanchuk, Al'Ona
AU - George, Thomas J.
AU - Hogan, William
AU - Wu, Yonghui
AU - Yang, Xi
AU - Bian, Jiang
AU - Weissman, Myrna
AU - Wickramaratne, Priya
AU - Mann, J. John
AU - Olfson, Mark
AU - Campion, Thomas R.
AU - Weiner, Mark
AU - Pathak, Jyotishman
N1 - Publisher Copyright:
© 2021 The Author(s) 2021.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Objective: Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods: A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results: Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion: NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
AB - Objective: Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods: A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results: Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion: NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
KW - Electronic health records
KW - Information extraction
KW - Machine learning
KW - Natural language processing
KW - Population health outcomes
KW - Social determinants of health
UR - http://www.scopus.com/inward/record.url?scp=85121184813&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocab170
DO - 10.1093/jamia/ocab170
M3 - Review article
C2 - 34613399
AN - SCOPUS:85121184813
SN - 1067-5027
VL - 28
SP - 2716
EP - 2727
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 12
ER -