Phoneme segmentation of continuous speech using multi-layer perceptron

Abstract

We propose a new method of phoneme segmentation using MLP (multi-layer perceptron). The structure of the proposed segmenter consists of three parts: preprocessor, MLP-based phoneme segmenter, and postprocessor. The preprocessor utilizes a sequence of 44 order feature parameters for each frame of speech, based on the acoustic-phonetic knowledge. The MLP has one hidden layer and an output layer. The feature parameters for four consecutive inter-frame features (176 parameters) are served as input data. The output value decides whether the current frame is a phoneme boundary or not. In postprocessing, we decide the positions of phoneme boundaries using the output of the MLP. We obtained 84% for 5 msec-accuracy and 87% for 15 msec-accuracy with an insertion rate of 9% for open test. By adjusting the threshold value of the MLP output, we achieved higher accuracy. When we decreased the threshold by 0.4, we obtained 5 msec-accuracy of 92% with insertion rate of 3.4% for the insertions that are more than 15 msec apart from phoneme boundaries.

References

Page 1

	Year	Citations

Page 1