Publication | Closed Access
Performance tradeoffs in dynamic time warping algorithms for isolated word recognition
564
Citations
14
References
1980
Year
EngineeringMachine LearningSpoken Language ProcessingCorpus LinguisticsSpeech RecognitionNatural Language ProcessingData ScienceData MiningPattern RecognitionPhoneticsRobust Speech RecognitionDynamic TimePerformance TradeoffsCharacter RecognitionHealth SciencesTemporal Pattern RecognitionComputer ScienceDistant Speech RecognitionSignal ProcessingSpeech AnalysisSpeech TechnologyTest PatternIsolated Word RecognitionTime RegistrationLanguage RecognitionDynamic ProgrammingSpeech ProcessingSpeech PerceptionLinguisticsPattern Recognition Application
Dynamic time warping, a dynamic programming technique for aligning time patterns, is widely used in isolated word recognition, with recent variants differing in global path constraints, local continuity, and distance weighting. This investigation examines how these variations affect performance on a realistic speech database. Performance was evaluated in terms of speed, memory usage, and recognition accuracy. The results show that axis orientation and relative pattern length influence accuracy, and a new approach that linearly warps both reference and test patterns to a fixed length before applying simplified DTW achieves performance comparable to or better than all other studied algorithms.
The technique of dynamic programming for the time registration of a reference and a test pattern has found widespread use in the area of isolated word recognition. Recently, a number of variations on the basic time warping algorithm have been proposed by Sakoe and Chiba, and Rabiner, Rosenberg, and Levinson. These algorithms all assume that the test input is the time pattern of a feature vector from an isolated word whose endpoints are known (at least approximately). The major differences in the methods are the global path constraints (i.e., the region of possible warping paths), the local continuity constraints on the path, and the distance weighting and normalization used to give the overall minimum distance. The purpose of this investigation is to study the effects of such variations on the performance of different dynamic time warping algorithms for a realistic speech database. The performance measures that were used include: speed of operation, memory requirements, and recognition accuracy. The results show that both axis orientation and relative length of the reference and the test patterns are important factors in recognition accuracy. Our results suggest a new approach to dynamic time warping for isolated words in which both the reference and test patterns are linearly warped to a fixed length, and then a simplified dynamic time warping algorithm is used to handle the nonlinear component of the time alignment. Results with this new algorithm show performance comparable to or better than that of all other dynamic time warping algorithms that were studied.
| Year | Citations | |
|---|---|---|
Page 1
Page 1