Towards Learning a Universal Non-Semantic Representation of Speech

Abstract

The ultimate goal of transfer learning is to reduce labeled data requirements\nby exploiting a pre-existing embedding model trained for different datasets or\ntasks. The visual and language communities have established benchmarks to\ncompare embeddings, but the speech community has yet to do so. This paper\nproposes a benchmark for comparing speech representations on non-semantic\ntasks, and proposes a representation based on an unsupervised triplet-loss\nobjective. The proposed representation outperforms other representations on the\nbenchmark, and even exceeds state-of-the-art performance on a number of\ntransfer learning tasks. The embedding is trained on a publicly available\ndataset, and it is tested on a variety of low-resource downstream tasks,\nincluding personalization tasks and medical domain. The benchmark, models, and\nevaluation code are publicly released.\n

References

Page 1

	Year	Citations

Page 1