Connectionist temporal classification

TLDR

Sequence learning tasks often require predicting label sequences from noisy, unsegmented data, as in speech recognition, yet recurrent neural networks need pre‑segmented training data and post‑processing, limiting their use. The authors propose a novel training method that enables RNNs to learn label sequences directly from unsegmented input, eliminating the need for pre‑segmentation and post‑processing. This framework trains RNNs to map noisy, unsegmented signals to label sequences without requiring explicit segmentation during training or inference. Experiments on the TIMIT speech corpus show that the proposed method outperforms both a baseline HMM and a hybrid HMM‑RNN.

Abstract

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

References

Page 1

	Year	Citations
Long Short-Term Memory Sepp Hochreiter, Jürgen Schmidhuber Neural Computation	1997	93.8K
A tutorial on hidden Markov models and selected applications in speech recognition L. R. Rabiner Proceedings of the IEEE EngineeringMachine LearningHidden StatesDiscrete Markov ChainsSpeech Recognition	1989	22.6K
Neural networks for pattern recognition Choice Reviews Online Recurrent Neural NetworkEngineeringMachine LearningData ScienceComputational Learning Theory	1994	18.7K
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando C. N. Pereira ScholarlyCommons (University of Pennsylvania)	2001	13K
Bidirectional recurrent neural networks Mike Schuster, Kuldip K. Paliwal IEEE Transactions on Signal Processing Natural Language ProcessingStructured PredictionConditional Posterior ProbabilityEngineeringMachine Learning	1997	9.6K
Framewise phoneme classification with bidirectional LSTM and other neural network architectures Alex Graves, Jürgen Schmidhuber Neural Networks Natural Language ProcessingFramewise Phoneme ClassificationEngineeringMachine LearningSpeech Processing	2005	5.2K
Backpropagation through time: what it does and how to do it Paul J. Werbos Proceedings of the IEEE Fault DiagnosisEngineeringMachine LearningNeural Networks (Machine Learning)Intelligent Systems	1990	4.8K
An application of recurrent nets to phone probability estimation A.J. Robinson IEEE Transactions on Neural Networks	1994	445
Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent Nicol N. Schraudolph Neural Computation	2002	280
Temporal classification : extending the classification paradigm to multivariate time series Mohammed Waleed Kadous	2002	187

Page 1