Adversarial Evaluation of Multimodal Machine Translation

TLDR

Combining vision and language in multimodal machine translation promises better translations, but inconsistent results create uncertainty about whether images actually improve quality. The authors present an adversarial evaluation method to directly examine the utility of image data in multimodal translation. The evaluation compares system performance using a congruent image versus a random incongruent image alongside the source sentence. Two of three publicly available systems are sensitive to image perturbation, so the authors recommend that all systems pass this evaluation.

Abstract

The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. However, inconsistent results have lead to uncertainty about whether the images actually improve translation quality. We present an adversarial evaluation method to directly examine the utility of the image data in this task. Our evaluation measures whether multimodal translation systems perform better given either the congruentimage or a random incongruent image, in add ition to the correct source language sentence. We find that two out of three publicly available systems are sensitive to this perturbation of the data, and recommend that all systems pass this evaluation in the future

References

Page 1

	Year	Citations

Page 1