Concepedia

Publication | Open Access

Identifying cognates by phonetic and semantic similarity

87

Citations

10

References

2001

Year

Grzegorz Kondrak

Unknown Venue

Abstract

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.

References

YearCitations

Page 1