Concepedia

Abstract

A similarity model was proposed to measure the si mi larity between sentences. Sentence similarity depends on the morphological simil arity and word order similarity, and the former plays more important role than t he latter. When the clause or the phrase of a sentence moves a long distance, th e model proposed ensures the close similarity of the generated sentence to the s ource. To improve the efficiency of searching for the most similar sentence by t raversal, a search algorithm based on inverted index and sentence length index w as proposed. The algorithm is highly efficient, and the average search time is u nremarkablely affected by the size of the corpora. The sentence similarity model and the search algorithm proposed can be used in large-scale-example-based m achine translation systems.