Publication | Open Access
Automatic alignment in parallel corpora
26
Citations
6
References
1994
Year
Unknown Venue
EngineeringDynamic Programming FrameworkSequence AlignmentCorpus LinguisticsText MiningNatural Language ProcessingApplied LinguisticsSyntaxLanguage DocumentationData ScienceComputational LinguisticsLanguage StudiesMachine TranslationComputer-assisted TranslationGeneric Alignment SchemeLinguisticsKnowledge DiscoveryCross-language RetrievalAutomatic AlignmentNeural Machine TranslationAlignment IssueLanguage CorpusSpeech Translation
This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters.
| Year | Citations | |
|---|---|---|
Page 1
Page 1