Publication | Open Access
Parallel Sentence Mining by Constrained Decoding
19
Citations
22
References
2020
Year
Unknown Venue
Syntactic ParsingEngineeringCross-lingual RepresentationMonolingual CorporaMultilingual PretrainingCorpus LinguisticsText MiningNatural Language ProcessingParallel Sentence MiningComputational LinguisticsGrammarMachine TranslationNlp TaskLinguisticsKnowledge DiscoveryComputer ScienceSemantic ParsingParallel SentencesNeural Machine TranslationArtsSpeech Translation
We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1