Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model

TLDR

Sinusoidal modeling has been successfully applied to a broad range of speech processing problems and offers advantages over linear predictive modeling and the short‑time Fourier transform for speech analysis, synthesis, and modification. This paper introduces a novel speech analysis/synthesis system that combines an overlap‑add sinusoidal model with an analysis‑by‑synthesis technique to determine model parameters. The authors detail an analysis procedure, present a frequency‑domain algorithm exploiting the FFT, and derive a refined overlap‑add sinusoidal model that enables shape‑invariant speech modification and a pitch‑scale algorithm preserving bandwidth while eliminating noise migration. Analysis‑by‑synthesis yields very high synthetic speech quality by accurately estimating component frequencies, removing sidelobe interference, and handling nonstationary events, while the refined overlap‑add model modifies speech without artifacts and the ABS/OLA system supports fixed and time‑varying modifications with FFT‑based computational shortcuts that make it feasible on current hardware.

Abstract

Sinusoidal modeling has been successfully applied to a broad range of speech processing problems, and offers advantages over linear predictive modeling and the short-time Fourier transform for speech analysis/synthesis and modification. This paper presents a novel speech analysis/synthesis system based on the combination of an overlap-add sinusoidal model with an analysis-by-synthesis technique to determine the model parameters. It describes this analysis procedure in detail, and introduces an equivalent frequency-domain algorithm that takes advantage of the computational efficiency of the fast Fourier transform (FFT). In addition, a refined overlap-add sinusoidal model capable of shape-invariant speech modification is derived, and a pitch-scale modification algorithm is defined that preserves speech bandwidth and eliminates noise migration effects. Analysis-by-synthesis achieves very high synthetic speech quality by accurately estimating the component frequencies, eliminating sidelobe interference effects, and effectively dealing with nonstationary speech events. The refined overlap-add synthesis model correlates well with analysis-by-synthesis, and modifies speech without objectionable artifacts by explicitly controlling shape invariance and phase coherence. The proposed analysis-by-synthesis/overlap-add (ABS/OLA) system allows for both fixed and time-varying time-, frequency-, and pitch-scale modifications, and computational shortcuts using the FFT algorithm make its implementation feasible using currently available hardware.

References

Page 1

	Year	Citations

Page 1