Rule based machine translation combined with statistical post editor for Japanese to English patent translation

Abstract

Since sentences in patent texts are long, they are difficult to translate by a machine. Although statistical machine translation is one of the major streams of the field, long patent sentences are difficult to translate not using syntactic analysis. We propose the combination of a rule based method and a statistical method. It is a rule based machine translation (RMT) with a statistical based post editor (SPE). The evaluation by the NIST score shows RMT+SPE is more accurate than RMT only. Manual checks, however, show the outputs of RMT+SPE often have strange expressions in the target language. So we propose a new evaluation measure NMG (normalized mean grams). Although NMG is based on n-gram, it counts the number of words in the longest word sequence matches between the test sentence and the target language reference corpus. We use two reference corpora. One is the reference translation only the other is a large scaled target language corpus. In the former case, RMT+SPE wins in the later case, RMT wins. 1.

References

Page 1

	Year	Citations

Page 1