Publication | Open Access
Are we using enough listeners? no! — an empirically-supported critique of interspeech 2014 TTS evaluations
37
Citations
13
References
2015
Year
Unknown Venue
Speech CorpusSpoken Language ProcessingCommunicationVoice EvaluationCorpus LinguisticsTts EvaluationsBlizzard Challenge 2013Speech RecognitionNatural Language ProcessingInterspeech 2014PhoneticsComputational LinguisticsLanguage TestingSpeech InterfaceSubjective EvaluationsConversation AnalysisLanguage StudiesBlizzard 2013Speech SynthesisArtsEnough ListenersSpeech CommunicationSpeech TechnologySpeech AnalysisSpeech AcousticsHuman-computer InteractionSpeech ProcessingSpeech PerceptionVoice TechnologyLinguisticsVoice Interaction
Tallying the numbers of listeners that took part in subjective evaluations of synthetic speech at Interspeech 2014 showed that in more than 60% of papers conclusions are based on listening tests with less than 20 listeners. Our analysis of Blizzard 2013 data shows that for a MOS test measuring naturalness a stable level of significance is only reached when more than 30 listeners are used. In this paper, we set out a list of guidelines, i.e., a checklist for carrying out meaningful subjective evaluations. We further illustrate the importance of sentence coverage and number of listeners by presenting changes to rank order and number of significant pairs by re-analysing data from the Blizzard Challenge 2013.
| Year | Citations | |
|---|---|---|
Page 1
Page 1