Publication | Closed Access
Preference-learning based inverse reinforcement learning for dialog control
26
Citations
9
References
2012
Year
Unknown Venue
Artificial IntelligenceNatural Language ProcessingInverse Reinforcement LearningCognitive ScienceEngineeringMachine LearningDialogue ManagementDialog SystemsConversational Recommender SystemComputer ScienceIntelligent SystemsRobot LearningSpoken Dialog SystemDialog ControlSequential Decision MakingDecision Theory
Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwisepreferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
| Year | Citations | |
|---|---|---|
Page 1
Page 1