Self-Knowledge Distillation via Feature Enhancement for Speaker Verification

Abstract

As the most widely used technique, deep speaker embedding learning has become predominant in speaker verification task recently. Very large neural networks such as ECAPA-TDNN and ResNet can achieve the state-of-the-art performance. However, large models are computationally unfriendly in general, which require massive storage and computation resources. Model compression has been a hot research topic. Parameter quantization usually results in significant performance degradation. Knowledge distillation demands a pretrained complex teacher model. In this paper, we introduce a novel self-knowledge distillation method, namely Self-Knowledge Distillation via Feature Enhancement (SKDFE). It utilizes an auxiliary self-teacher network to distill its own refined knowledge without the need of a pretrained teacher network. Additionally, we apply the self-knowledge distillation at two different levels: label level and feature level. Experiments on Voxceleb dataset show that our proposed self-knowledge distillation method can make small models have comparable or even better performance than large ones. Large models can also be further improved when applying our method.

References

Page 1

	Year	Citations

Page 1