Publication | Closed Access
Quantifying the utility of parallel corpora
15
Citations
3
References
2001
Year
Unknown Venue
EngineeringMultilingual PretrainingLanguage LearningCorpus LinguisticsText MiningApplied LinguisticsNatural Language ProcessingLanguage DocumentationInformation RetrievalData ScienceParallel CorporaComputational LinguisticsLanguage EngineeringLanguage StudiesMachine TranslationLanguage TechnologyCross-language RetrievalDistributional SemanticsCorpus SizeQuery WordsLanguage CorpusParallel ProgrammingLexical Complexity PredictionLinguistics
Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.
| Year | Citations | |
|---|---|---|
Page 1
Page 1