Publication | Closed Access
Improving Multiparty Interactions with a Robot Using Large Language Models
20
Citations
34
References
2023
Year
Unknown Venue
Artificial IntelligenceHuman-robot Collaborative AssemblyEngineeringRobotic AgentIntelligent RoboticsCognitive RoboticsSpoken Language ProcessingCorpus LinguisticsSpeech RecognitionNatural Language ProcessingComputational LinguisticsSpeaker DiarizationHumanrobot CollaborationConversation AnalysisVoice RecognitionRobot LearningLanguage StudiesMultiparty InteractionsDiarization ToolLinguisticsComputer ScienceSpeech CommunicationSpeech ProcessingSpeech InputSpeech PerceptionRoboticsSpeech InterfaceDiarization Pipeline
Speaker diarization is a key component of systems that support multiparty interactions of co-located users, such as meeting facilitation robots. The goal is to identify who spoke what, often to provide feedback, moderate participation, and personalize responses by the robot. Current systems use a combination of acoustic (e.g. pitch differences) and visual features (e.g. gaze) to perform diarization, but involve the use of additional sensors or require overhead signal processing efforts. Alternatively, automatic speech recognition (ASR) is a necessary step in the diarization pipeline, and utilizing the transcribed text to directly identify speaker labels in the conversation can eliminate such challenges. With that motivation, we leverage large language models (LLMs) to identify speaker labels from transcribed text and observe an exact match of 77% and a word level accuracy of 90%. We discuss our findings and the potential use of LLMs as a diarization tool for future systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1