Evaluating machine translation output with automatic sentence segmentation.

TLDR

This paper proposes an automatic sentence segmentation method to evaluate machine translation output that may contain erroneous sentence boundaries. The method processes translation hypotheses with mismatched or unsegmented boundaries using an edit‑distance‑based algorithm that handles multiple reference translations. Experiments show that the automatically segmented hypotheses enable evaluation measures that correlate with human judgments as well as or better than those based on manual sentence boundaries, making the method especially useful for spoken‑language translations.

Abstract

This paper presents a novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries. The algorithm can process translation hypotheses with segment boundaries which do not correspond to the reference segment boundaries, or a completely unsegmented text stream. Thus, the method is especially useful for evaluating translations of spoken language. The evaluation procedure takes advantage of the edit distance algorithm and is able to handle multiple reference translations. It efficiently produces an optimal automatic segmentation of the hypotheses and thus allows application of existing well-established evaluation measures. Experiments show that the evaluation measures based on the automatically produced segmentation correlate with the human judgement at least as well as the evaluation measures which are based on manual sentence boundaries.

References

Page 1

	Year	Citations

Page 1