Publication | Closed Access
LETR: A Lightweight and Efficient Transformer for Keyword Spotting
21
Citations
21
References
2022
Year
Convolutional Neural NetworkEngineeringSpeech RecognitionNatural Language ProcessingImage AnalysisData SciencePattern RecognitionRobust Speech RecognitionVideo TransformerKeyword SpottingReal-time LanguageMachine TranslationMachine VisionComputer EngineeringComputer ScienceDeep LearningComputer VisionSpeech ProcessingSpeech InputTransformer Architectures
Transformer recently has achieved impressive success in a number of domains, including machine translation, image recognition, and speech recognition. Most of the previous work on Keyword Spotting (KWS) is built upon convolutional or recurrent neural networks. In this paper, we explore a family of Transformer architectures for keyword spotting, optimizing the trade-off between accuracy and efficiency in a high-speed regime. We also studied the effectiveness and summarized the principles of applying key components in vision Transformers to KWS, including patch embedding, position encoding, attention mechanism, and class token. On top of the findings, we propose the LeTR: a lightweight and highly efficient Transformer for KWS. We consider different efficiency measures on different edge devices so as to reflect a wide range of application scenarios best. Experimental results on two common benchmarks demonstrate that LeTR has achieved state-of-the-art results over competing methods with respect to the speed/accuracy trade-off.
| Year | Citations | |
|---|---|---|
Page 1
Page 1