Estonian Speech Recognition and Transcription Editing Service

Abstract

This paper describes the latest iteration of our Estonian speech recognition system and the publicly available transcription editing service. The system is now based on an end-to-end wav2vec2.0 model. It achieves a word error rate of 6.9% on a test set of broadcast conversations. Besides recognition it performs speaker diarization, speaker identification, Estonian language detection, and punctuation restoration. The service consists of a speech processing pipeline, web server and a web-based user interface for end-users, offering transcript editing and speaker annotation functionality. The core components of the service have been made open-source and deployed internally by multiple public and private institutions.

References

Page 1

	Year	Citations

Page 1