Publication | Closed Access
Speech analysis/Synthesis based on a sinusoidal representation
1.6K
Citations
9
References
1986
Year
MusicSinusoidal ModelEngineeringHealth SciencesPhoneticsAudio Signal ProcessingSpeech SynthesisSpeech OutputSpeech ProcessingSpeech Analysis/synthesisSynthetic WaveformSound SynthesisSpeech PerceptionSpeech WaveformSpeech CommunicationSpeech Recognition
The system provides a foundation for novel speech transformation techniques such as time‑scale, pitch‑scale modification, and mid‑rate coding. The authors aim to develop a sinusoidal analysis/synthesis technique that represents speech using amplitudes, frequencies, and phases of component sine waves. They estimate these sinusoidal parameters from the short‑time Fourier transform via peak‑picking, track rapid spectral changes using birth‑death events, unwrap and interpolate phase with cubic functions, and synthesize speech by amplitude‑modulating and summing the sine waves. The synthesized waveform closely matches the original in shape and perception, remains robust to noise, and generalizes to overlapping speech, music, and marine sounds.
A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].
| Year | Citations | |
|---|---|---|
Page 1
Page 1