TY - GEN
T1 - On reducing classifier granularity in mining concept-drifting data streams
AU - Wang, Peng
AU - Wang, Haixun
AU - Wu, Xiaochen
AU - Wang, Wei
AU - Shi, Baile
PY - 2005
Y1 - 2005
N2 - Many applications use classification models on streaming data to detect actionable alerts. Due to concept drifts in the underlying data, how to maintain a model's uptodateness has become one of the most challenging tasks in mining data streams. State of the art approaches, including both the incrementally updated classifiers and the ensemble classifiers, have proved that model update is a very costly process. In this paper, we introduce the concept of model granularity. We show that reducing model granularity will reduce model update cost. Indeed, models of fine granularity enable us to efficiently pinpoint local components in the model that are affected by the concept drift. It also enables us to derive new components that can easily integrate with the model to reflect the current data distribution, thus avoiding expensive updates on a global scale. Experiments on real and synthetic data show that our approach is able to maintain good prediction accuracy at a fraction of model updating cost of state of the art approaches.
AB - Many applications use classification models on streaming data to detect actionable alerts. Due to concept drifts in the underlying data, how to maintain a model's uptodateness has become one of the most challenging tasks in mining data streams. State of the art approaches, including both the incrementally updated classifiers and the ensemble classifiers, have proved that model update is a very costly process. In this paper, we introduce the concept of model granularity. We show that reducing model granularity will reduce model update cost. Indeed, models of fine granularity enable us to efficiently pinpoint local components in the model that are affected by the concept drift. It also enables us to derive new components that can easily integrate with the model to reflect the current data distribution, thus avoiding expensive updates on a global scale. Experiments on real and synthetic data show that our approach is able to maintain good prediction accuracy at a fraction of model updating cost of state of the art approaches.
UR - http://www.scopus.com/inward/record.url?scp=34548554258&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2005.108
DO - 10.1109/ICDM.2005.108
M3 - Conference contribution
AN - SCOPUS:34548554258
SN - 0769522785
SN - 9780769522784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 474
EP - 481
BT - Proceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
T2 - 5th IEEE International Conference on Data Mining, ICDM 2005
Y2 - 27 November 2005 through 30 November 2005
ER -