Influence of graph construction on graph-based clustering measures

Abstract

Graph clustering methods such as spectral clustering are defined for general\nweighted graphs. In machine learning, however, data often is not given in form\nof a graph, but in terms of similarity (or distance) values between points. In this\ncase, first a neighborhood graph is constructed using the similarities between the\npoints and then a graph clustering algorithm is applied to this graph. In this paper\nwe investigate the influence of the construction of the similarity graph on\nthe clustering results. We first study the convergence of graph clustering criteria\nsuch as the normalized cut (Ncut) as the sample size tends to infinity. We\nfind that the limit expressions are different for different types of graph, for example\nthe r-neighborhood graph or the k-nearest neighbor graph. In plain words:\nNcut on a kNN graph does something systematically different than Ncut on an\nr-neighborhood graph! This finding shows that graph clustering criteria cannot be\nstudied independently of the kind of graph they are applied to. We also provide\nexamples which show that these differences can be observed for toy and real data\nalready for rather small sample sizes.

References

Page 1

	Year	Citations

Page 1