Different measurements metrics to evaluate a chatbot system

TLDR

Chatbots are software systems that interact with users in natural language, and they have traditionally been evaluated in contests such as the Loebner Prize, with examples including Qur'an and FAQchat prototypes. The study investigates training and adapting chatbots to a specific user's language or application using a user‑supplied training corpus. The authors evaluate this approach through open‑ended trials with real users, measuring dialogue efficiency with glass‑box metrics, dialogue quality with black‑box metrics, and collecting user satisfaction feedback. They conclude that evaluation metrics must be tailored to the application and user needs.

Abstract

A chatbot is a software system, which can interact or "chat" with a human user in natural language such as English. For the annual Loebner Prize contest, rival chatbots have been assessed in terms of ability to fool a judge in a restricted chat session. We are investigating methods to train and adapt a chatbot to a specific user's language use or application, via a user-supplied training corpus. We advocate open-ended trials by real users, such as an example Afrikaans chatbot for Afrikaans-speaking researchers and students in South Africa. This is evaluated in terms of "glass box" dialogue efficiency metrics, and "black box" dialogue quality metrics and user satisfaction feedback. The other examples presented in this paper are the Qur'an and the FAQchat prototypes. Our general conclusion is that evaluation should be adapted to the application and to user needs.

References

Page 1

	Year	Citations

Page 1