The Curse of Highly Variable Functions for Local Kernel Machines

Abstract

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smooth-ness prior – with similarity between examples expressed with a local kernel – are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semi-supervised and unsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned func-tion at x depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical non-parametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may ex-ist short descriptions of the target function, i.e. their Kolmogorov com-plexity may be low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many vari-ations), while not using very specific prior domain knowledge. 1

References

Page 1

	Year	Citations

Page 1