Learning Gaussian Mixtures with Generalised Linear Models: Precise\n Asymptotics in High-dimensions

Abstract

Generalised linear models for multi-class classification problems are one of\nthe fundamental building blocks of modern machine learning tasks. In this\nmanuscript, we characterise the learning of a mixture of $K$ Gaussians with\ngeneric means and covariances via empirical risk minimisation (ERM) with any\nconvex loss and regularisation. In particular, we prove exact asymptotics\ncharacterising the ERM estimator in high-dimensions, extending several previous\nresults about Gaussian mixture classification in the literature. We exemplify\nour result in two tasks of interest in statistical learning: a) classification\nfor a mixture with sparse means, where we study the efficiency of $\\ell_1$\npenalty with respect to $\\ell_2$; b) max-margin multi-class classification,\nwhere we characterise the phase transition on the existence of the multi-class\nlogistic maximum likelihood estimator for $K>2$. Finally, we discuss how our\ntheory can be applied beyond the scope of synthetic data, showing that in\ndifferent cases Gaussian mixtures capture closely the learning curve of\nclassification tasks in real data sets.\n