TY - JOUR
T1 - reval
T2 - A Python package to determine best clustering solutions with stability-based relative clustering validation
AU - Landi, Isotta
AU - Mandelli, Veronica
AU - Lombardo, Michael V.
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2021/4/9
Y1 - 2021/4/9
N2 - Determining the best partition for a dataset can be a challenging task because of the lack of a priori information within an unsupervised learning framework and the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to select best clustering solutions as the ones that replicate, via supervised learning, on unseen subsets of data. The implementation of relative validation methods can contribute to the theory of clustering by fostering new approaches for the investigation of clustering results in different situations and for different data distributions. This work aims at contributing to this effort by implementing a package that works with multiple clustering and classification algorithms, hence allowing both the automation of the labeling process and the assessment of the stability of different clustering mechanisms.
AB - Determining the best partition for a dataset can be a challenging task because of the lack of a priori information within an unsupervised learning framework and the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to select best clustering solutions as the ones that replicate, via supervised learning, on unseen subsets of data. The implementation of relative validation methods can contribute to the theory of clustering by fostering new approaches for the investigation of clustering results in different situations and for different data distributions. This work aims at contributing to this effort by implementing a package that works with multiple clustering and classification algorithms, hence allowing both the automation of the labeling process and the assessment of the stability of different clustering mechanisms.
KW - DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem
KW - clustering
KW - clustering replicability
KW - stability-based relative validation
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85104157026&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2021.100228
DO - 10.1016/j.patter.2021.100228
M3 - Article
AN - SCOPUS:85104157026
SN - 2666-3899
VL - 2
JO - Patterns
JF - Patterns
IS - 4
M1 - 100228
ER -