Publication | Open Access
Using longest common subsequence and character models to predict word forms
15
Citations
8
References
2016
Year
Unknown Venue
EngineeringSpoken Language ProcessingPhonologyCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingWord EmbeddingsSyntaxComputational LinguisticsPhoneticsLanguage EngineeringGrammarVowel HarmonyCharacter Ngram ModelsLanguage StudiesMachine TranslationSequence ModellingWord FormsLanguage TechnologyComputer ScienceDistributional SemanticsCharacter ModelsLongest Common SubsequenceInflected Word FormsLanguage RecognitionSpeech ProcessingLexical Complexity PredictionLinguistics
This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially enhances performance especially for the languages possessing nonlocal phenomena such as vowel harmony. Our system took part in SIGMORPHON 2016 Shared Task and took 3rd place in 17 of 30 subtasks and 4th place in 7 substasks among 7 participants.
| Year | Citations | |
|---|---|---|
Page 1
Page 1