Publication | Open Access
Aligning sentences in bilingual corpora using lexical information
230
Citations
9
References
1993
Year
Unknown Venue
EngineeringMultilingualismCorpus LinguisticsText MiningApplied LinguisticsNatural Language ProcessingSyntaxComputational LinguisticsLanguage StudiesMachine TranslationComputer-assisted TranslationBilingual CorpusSentence LengthLinguisticsTranslation ModelNeural Machine TranslationBilingual CorporaLanguage CorpusSpeech Translation
In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.
| Year | Citations | |
|---|---|---|
Page 1
Page 1