Maximum conditional mutual information projection for speech recognition

Abstract

Linear discriminant analysis (LDA) in its original modelfree formulation is best suited to classification problems with equal-covariance classes. Heteroscedastic discriminant analysis (HDA) removes this equal covariance constraint, and therefore is more suitable for automatic speech recognition (ASR) systems. However, maximizing HDA objective function does not correspond directly to minimizing the recognition error. In its original formulation, HDA solves a maximum likelihood estimation problem in the original feature space to calculate the HDAtransformationmatrix. Sincethedimensionoftheoriginal feature space in ASR problems is usually high, the estimation of the HDA transformation matrix becomes computationally expensive and requires a large amount of training data. This paper presentsageneralizationofLDAthatsolvesthesetwoproblems. We start with showing that the calculation of the LDA projection matrix is a maximum mutual information estimation probleminthelower-dimensionalspacewithsomeconstraintsonthe modelofthejointconditionalandunconditionalprobabilitydensity functions (PDF) of the features, and then, by relaxing these constraints,wedevelopadimensionalityreductionapproachthat maximizestheconditionalmutualinformationbetweentheclass identity and the feature vector in the lower-dimensional space given the recognizer model. Using this approach, we achieved 1% improvement in phoneme recognition accuracy compared to the baseline system. Improvement in recognition accuracy compared to both LDA and HDA approaches is also achieved .

References

Page 1

	Year	Citations

Page 1