Publication | Open Access
Efficient Word Alignment with Markov Chain Monte Carlo
107
Citations
27
References
2016
Year
EngineeringMarkov Chain Monte CarloLarge Language ModelEfficient Word AlignmentCorpus LinguisticsText MiningNatural Language ProcessingData ScienceComputational LinguisticsLanguage EngineeringLanguage StudiesMachine TranslationComputer-assisted TranslationNew SystemMonte Carlo SamplingDistributional SemanticsNeural Machine TranslationSmt PractitionerLinguistics
Abstract We present EFMARAL, a new system for efficient and accurate word alignment using a Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency and alignment accuracy. Our evaluation shows that a phrase-based statistical machine translation (SMT) system produces translations of higher quality when using word alignments from EFMARAL than from fast_align, and that translation quality is on par with what is obtained using GIZA++, a tool requiring orders of magnitude more processing time. More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others interested in word alignment.
| Year | Citations | |
|---|---|---|
Page 1
Page 1