Simultaneous Speech Translation for Live Subtitling: from Delay to\n Display

Abstract

With the increased audiovisualisation of communication, the need for live\nsubtitles in multilingual events is more relevant than ever. In an attempt to\nautomatise the process, we aim at exploring the feasibility of simultaneous\nspeech translation (SimulST) for live subtitling. However, the word-for-word\nrate of generation of SimulST systems is not optimal for displaying the\nsubtitles in a comprehensible and readable way. In this work, we adapt SimulST\nsystems to predict subtitle breaks along with the translation. We then propose\na display mode that exploits the predicted break structure by presenting the\nsubtitles in scrolling lines. We compare our proposed mode with a display 1)\nword-for-word and 2) in blocks, in terms of reading speed and delay.\nExperiments on three language pairs (en$\\rightarrow$it, de, fr) show that\nscrolling lines is the only mode achieving an acceptable reading speed while\nkeeping delay close to a 4-second threshold. We argue that simultaneous\ntranslation for readable live subtitles still faces challenges, the main one\nbeing poor translation quality, and propose directions for steering future\nresearch.\n