Abstract
This Letter addresses the statistical significance of structures in random data: Given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.
| Original language | English |
|---|---|
| Article number | 220601 |
| Journal | Physical Review Letters |
| Volume | 105 |
| Issue number | 22 |
| DOIs | |
| State | Published - 23 Nov 2010 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Significance analysis and statistical mechanics: An application to clustering'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver