Publication | Closed Access
Building Bilingual Parallel Corpora Based on Wikipedia
50
Citations
8
References
2010
Year
Unknown Venue
EngineeringMultilingualismBilingual Parallel CorporaSentence-level AlignmentCorpus LinguisticsText MiningNatural Language ProcessingLanguage DocumentationParallel CorporaComputational LinguisticsCorpus-based Machine TranslationLanguage StudiesMachine TranslationComputer-assisted TranslationLinguisticsCross-language RetrievalNeural Machine TranslationLanguage CorpusSpeech Translation
Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1