TY - JOUR
T1 - Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy
AU - Askland, Kathleen D.
AU - Garnaat, Sarah
AU - Sibrava, Nicholas J.
AU - Boisseau, Christina L.
AU - Strong, David
AU - Mancebo, Maria
AU - Greenberg, Benjamin
AU - Rasmussen, Steve
AU - Eisen, Jane
N1 - Publisher Copyright:
© 2015 John Wiley & Sons, Ltd.
PY - 2015/6/1
Y1 - 2015/6/1
N2 - The study objective was to apply machine learning methodologies to identify predictors of remission in a longitudinal sample of 296 adults with a primary diagnosis of obsessive compulsive disorder (OCD). Random Forests is an ensemble machine learning algorithm that has been successfully applied to large-scale data analysis across vast biomedical disciplines, though rarely in psychiatric research or for application to longitudinal data. When provided with 795 raw and composite scores primarily from baseline measures, Random Forest regression prediction explained 50.8% (5000-run average, 95% bootstrap confidence interval [CI]: 50.3-51.3%) of the variance in proportion of time spent remitted. Machine performance improved when only the most predictive 24 items were used in a reduced analysis. Consistently high-ranked predictors of longitudinal remission included Yale-Brown Obsessive Compulsive Scale (Y-BOCS) items, NEO items and subscale scores, Y-BOCS symptom checklist cleaning/washing compulsion score, and several self-report items from social adjustment scales. Random Forest classification was able to distinguish participants according to binary remission outcomes with an error rate of 24.6% (95% bootstrap CI: 22.9-26.2%). Our results suggest that clinically-useful prediction of remission may not require an extensive battery of measures. Rather, a small set of assessment items may efficiently distinguish high- and lower-risk patients and inform clinical decision-making.
AB - The study objective was to apply machine learning methodologies to identify predictors of remission in a longitudinal sample of 296 adults with a primary diagnosis of obsessive compulsive disorder (OCD). Random Forests is an ensemble machine learning algorithm that has been successfully applied to large-scale data analysis across vast biomedical disciplines, though rarely in psychiatric research or for application to longitudinal data. When provided with 795 raw and composite scores primarily from baseline measures, Random Forest regression prediction explained 50.8% (5000-run average, 95% bootstrap confidence interval [CI]: 50.3-51.3%) of the variance in proportion of time spent remitted. Machine performance improved when only the most predictive 24 items were used in a reduced analysis. Consistently high-ranked predictors of longitudinal remission included Yale-Brown Obsessive Compulsive Scale (Y-BOCS) items, NEO items and subscale scores, Y-BOCS symptom checklist cleaning/washing compulsion score, and several self-report items from social adjustment scales. Random Forest classification was able to distinguish participants according to binary remission outcomes with an error rate of 24.6% (95% bootstrap CI: 22.9-26.2%). Our results suggest that clinically-useful prediction of remission may not require an extensive battery of measures. Rather, a small set of assessment items may efficiently distinguish high- and lower-risk patients and inform clinical decision-making.
KW - Obsessive compulsive disorder
KW - Risk factors
KW - Statistics
UR - http://www.scopus.com/inward/record.url?scp=84931563566&partnerID=8YFLogxK
U2 - 10.1002/mpr.1463
DO - 10.1002/mpr.1463
M3 - Article
C2 - 25994109
AN - SCOPUS:84931563566
SN - 1049-8931
VL - 24
SP - 156
EP - 169
JO - International Journal of Methods in Psychiatric Research
JF - International Journal of Methods in Psychiatric Research
IS - 2
ER -