Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data

Abstract

Abstract — This paper develops a method for automatically incorporating variable selection in Fisher’s linear discriminant analysis (LDA). Utilizing the connection of Fisher’s LDA and a generalized eigenvalue problem, our approach applies the method of regularization to obtain sparse linear discriminant vectors, where “sparse ” means that the discriminant vectors have only a small number of nonzero components. Our sparse LDA procedure is especially effective in the so-called high dimensional, low sample size (HDLSS) settings, where LDA possesses the “data piling ” property, that is, it maps all points from the same class in the training data to a common point, and so when viewed along the LDA projection directions, the data are piled up. Data piling indicates overfitting and usually results in poor out-of-sample classification. By incorporating variable selection, the sparse LDA overcomes the data piling problem. The underlying assumption is that, among the large number of variables there are many irrelevant or redundant variables for the purpose of classification. By using only important or significant variables we essentially deal with a lower dimensional problem. Both synthetic and real data sets are used to illustrate the proposed method.

References

Page 1

	Year	Citations

Page 1