Publication | Closed Access
Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes
33
Citations
8
References
2018
Year
Total Variation DistanceMachine LearningEngineeringTight Sample ComplexityUnsupervised Machine LearningData SciencePattern RecognitionMixture AnalysisSample Compression SchemesApproximation TheoryStatisticsDensity EstimationGaussian AnalysisSample CompressionMixture DistributionSparse RepresentationGaussian ProcessStatistical InferenceLower Bounds
We prove that ϴ(k d^2 / e^2) samples are necessary and sufficient for learning a mixture of k Gaussians in R^d, up to error e in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that O(k d / e^2) samples suffice, matching a known lower bound. The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in R^d has an efficient sample compression.
| Year | Citations | |
|---|---|---|
Page 1
Page 1