Concepedia

Publication | Closed Access

Building Bilingual Parallel Corpora Based on Wikipedia

50

Citations

8

References

2010

Year

Abstract

Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.

References

YearCitations

Page 1