TY - JOUR
T1 - Multi-view cluster analysis with incomplete data to understand treatment effects
AU - Chao, Guoqing
AU - Sun, Jiangwen
AU - Lu, Jin
AU - Wang, An Li
AU - Langleben, Daniel D.
AU - Li, Chiang Shan
AU - Bi, Jinbo
N1 - Publisher Copyright:
© 2019 Elsevier Inc.
PY - 2019/8
Y1 - 2019/8
N2 - Multi-view cluster analysis, as a popular granular computing method, aims to partition sample subjects into consistent clusters across different views in which the subjects are characterized. Frequently, data entries can be missing from some of the views. The latest multi-view co-clustering methods cannot effectively deal with incomplete data, especially when there are mixed patterns of missing values. We propose an enhanced formulation for a family of multi-view co-clustering methods to cope with the missing data problem by introducing an indicator matrix whose elements indicate which data entries are observed and assessing cluster validity only on observed entries. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to imputation uncertainty. In comparison with other state-of-the-art multi-view incomplete clustering methods, our approach is sensible in the cases of either missing any entry in a view or missing the entire view. We first validated the proposed strategy in simulations, and then applied it to a treatment study of opioid dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in the treatment study were naturally assessed in different feature spaces such as in the pre-, during- and post-treatment time windows. Our algorithm was able to identify subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the identification of pre-treatment (baseline) features predictive of post-treatment outcomes. We found that cue-induced heroin craving predicts adherence to XR-NTX therapy. This finding is consistent with the clinical literature, serving to validate our approach.
AB - Multi-view cluster analysis, as a popular granular computing method, aims to partition sample subjects into consistent clusters across different views in which the subjects are characterized. Frequently, data entries can be missing from some of the views. The latest multi-view co-clustering methods cannot effectively deal with incomplete data, especially when there are mixed patterns of missing values. We propose an enhanced formulation for a family of multi-view co-clustering methods to cope with the missing data problem by introducing an indicator matrix whose elements indicate which data entries are observed and assessing cluster validity only on observed entries. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to imputation uncertainty. In comparison with other state-of-the-art multi-view incomplete clustering methods, our approach is sensible in the cases of either missing any entry in a view or missing the entire view. We first validated the proposed strategy in simulations, and then applied it to a treatment study of opioid dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in the treatment study were naturally assessed in different feature spaces such as in the pre-, during- and post-treatment time windows. Our algorithm was able to identify subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the identification of pre-treatment (baseline) features predictive of post-treatment outcomes. We found that cue-induced heroin craving predicts adherence to XR-NTX therapy. This finding is consistent with the clinical literature, serving to validate our approach.
KW - Co-clustering
KW - Granular computing
KW - Missing value
KW - Multi-view data analysis
KW - Opioid addiction
UR - http://www.scopus.com/inward/record.url?scp=85065093563&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2019.04.039
DO - 10.1016/j.ins.2019.04.039
M3 - Article
AN - SCOPUS:85065093563
SN - 0020-0255
VL - 494
SP - 278
EP - 293
JO - Information Sciences
JF - Information Sciences
ER -