Text-translation alignment

Abstract

We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations. 1. The Problem To align a text with a translation of it in another language is, in the terminology of this paper, to show which of its parts are translated by what parts of the second text. The result takes the form of a list of pairs of items--words, sentences, paragraphs, or whatever--from the two texts. A pair (a ~ b&gt; is on the list if a is translated, in whole or in part, by b. If (a, b&gt; and (a, c) are on the list, it is because a is translated partly by b, and partly by c. We say that the alignment is partial if only some of the items of the chosen kind from one or other of the texts are represented in the pairs. Otherwise, it is complete.

References

Page 1

	Year	Citations

Page 1