Publication | Closed Access
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
75
Citations
42
References
2021
Year
Unknown Venue
Artificial IntelligenceLanguage GroundingEngineeringReverie BenchmarkCognitionSemanticsAttentionEmbodied AgentNatural Language ProcessingMultimodal LlmVisual GroundingComputational LinguisticsVisual Question AnsweringRobot LearningLanguage StudiesReferred Remote ObjectKnowledge RepresentationCognitive ScienceSemantic InterpretationVision Language ModelVisual ReasoningReverie-success RateLinguistics
The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction. Different from related VLN tasks, the key to REVERIE is to conduct goal-oriented exploration instead of strict instruction-following, due to the lack of step-by-step navigation guidance. In this paper, we propose a novel Cross-modality Knowledge Reasoning (CKR) model to address the unique challenges of this task. The CKR, based on a transformer-architecture, learns to generate scene memory tokens and utilise these informative history clues for exploration. Particularly, a Room-and-Object Aware Attention (ROAA) mechanism is devised to explicitly perceive the room- and object-type information from both linguistic and visual observations. Moreover, through incorporating commonsense knowledge, we propose a Knowledge-enabled Entity Relationship Reasoning (KERR) module to learn the internal-external correlations among room- and object-entities for agent to make proper action at each viewpoint. Evaluation on REVERIE benchmark demonstrates the superiority of the CKR model, which significantly boosts SPL and REVERIE-success rate by 64.67% and 46.05%, respectively. Code is available at: https://github.com/alloldman/CKR.
| Year | Citations | |
|---|---|---|
Page 1
Page 1