Publication | Closed Access
Neural Network Based Time-Frequency Masking and Steering Vector Estimation for Two-Channel Mvdr Beamforming
32
Citations
27
References
2018
Year
Unknown Venue
EngineeringSensor ArrayNeural NetworkSpeech EnhancementSteering Vector EstimationSpeech RecognitionRobust Speech RecognitionHealth SciencesMulti-channel ProcessingDeep LearningDeep Neural NetworkSignal ProcessingSpeech CommunicationConvolution Neural NetworkArray ProcessingTime-frequency MaskingSpeech SeparationSpeech ProcessingBeamformingChannel Estimation
We present a neural network based approach to two-channel beamforming. First, single- and cross-channel spectral features are extracted to form a feature map for each utterance. A large neural network that is the concatenation of a convolution neural network (CNN), long short-term memory recurrent neural network (LSTM-RNN) and deep neural network (DNN) is then employed to estimate frame-level speech and noise masks. Later, these predicted masks are used to compute cross-power spectral density (CPSD) matrices which are used to estimate the minimum variance distortion-less response (MVDR) beamformer coefficients. In the end, a DNN is trained to optimize the phase in the estimated steering vectors to make it robust for reverberant conditions. We compare our methods with two state-of-the-art two-channel speech enhancement systems, i.e., time-frequency masking and masking-based beamforming. Results show the proposed method leads to 21 % relative improvement in word error rate (WER) over other systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1