Publication | Closed Access
Realtime voice activity and pitch modulation for laryngectomy transducers using head and facial gestures
27
Citations
0
References
2015
Year
EngineeringBiometricsWearable TechnologyVoice SurgeryElectrolarynx Control InterfaceSpeech RecognitionKinesiologyVideo Feature ExtractionPhoneticsFacial GesturesVoice RecognitionPitch ModulationHealth SciencesSpeech SynthesisLarynxUltrasoundSpeech CommunicationSpeech TechnologyElectrolarynx DesignsRealtime Voice ActivityVoiceEye TrackingSpeech ProcessingSpeech InputSpeech PerceptionSpeech Interface
Individuals who have undergone laryngectomy often rely on handheld transducers (i.e., the electrolarynx) to excite the vocal tract and produce speech. Widely used electrolarynx designs are limited, in that they require manual control of voice activity and pitch modulation. It would be advantageous to have an interface that requires less training, perhaps using the remaining, intact speech production system as a scaffold. Strong evidence exists that aspects of head motion and facial gestures are highly correlated with gestures of voicing and pitch. Therefore, the goal of project MANATEE is to develop an electrolarynx control interface which takes advantage of those correlations. The focus of the current study is to determine the feasibility of using head and facial features to accurately and efficiently modulate the pitch of speaker's electrolarynx in real time on a mobile platform using the built-in video camera. A prototype interface, capable of running on desktop machines and compatible Android devices, is implemented using OpenCV for video feature extraction and statistical prediction of the electrolarynx control signal. Initial performance evaluation is promising, showing pitch prediction accuracies at double the chance-level baseline, and prediction delays well below the perceptually-relevant, ~50 ms threshold.