Publication | Closed Access
Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance
82
Citations
17
References
2014
Year
Unknown Venue
Mutual information is a very popular measure for comparing clusterings. Previous work has shown that it is beneficial to make an adjust-ment for chance to this measure, by subtracting an expected value and normalizing via an upper bound. This yields the constant baseline prop-erty that enhances intuitiveness. In this paper, we argue that a further type of statistical adjust-ment for the mutual information is also beneficial – an adjustment to correct selection bias. This type of adjustment is useful when carrying out many clustering comparisons, to select one or more preferred clusterings. It reduces the ten-dency for the mutual information to choose clus-tering solutions i) with more clusters, or ii) in-duced on fewer data points, when compared to a reference one. We term our new adjusted mea-sure the standardized mutual information. It re-quires computation of the variance of mutual in-formation under a hypergeometric model of ran-domness, which is technically challenging. We derive an analytical formula for this variance and analyze its complexity. We then experimentally assess how our new measure can address selec-tion bias and also increase interpretability. We recommend using the standardized mutual infor-mation when making multiple clustering compar-isons in situations where the number of records is small compared to the number of clusters consid-ered. 1.
| Year | Citations | |
|---|---|---|
Page 1
Page 1