Concepedia

TLDR

Environmental robustness of DNN acoustic models can be significantly improved by multi‑condition training data, and because data collection is costly, simulation of desired conditions is frequently adopted. The paper details a data augmentation approach for far‑field ASR that examines the impact of simulated room impulse responses and point‑source noises. The approach uses simulated RIRs and point‑source noises and is evaluated on several LVCSR tasks representing distant‑ and close‑talking scenarios. Adding point‑source noises eliminates the performance gap between simulated and real RIRs, and the trained models perform well in distant‑talking and better in close‑talking scenarios.

Abstract

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.

References

YearCitations

Page 1