Publication | Closed Access
A comparative study on Thai word segmentation approaches
116
Citations
9
References
2008
Year
Unknown Venue
EngineeringThai Word SegmentationCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingData SciencePattern RecognitionText RecognitionComputational LinguisticsText SegmentationDocument ClassificationLanguage StudiesMachine TranslationKnowledge DiscoveryMorphologyTerminology ExtractionComparative StudyDcb ApproachWord Segmentation ApproachesKeyword ExtractionText ProcessingLinguistics
In this paper, we analyze and compare various approaches for Thai word segmentation. The word segmentation approaches could be classified into two distinct types, dictionary based (DCB) and machine learning based (MLB). The DCB approach relies on a set of terms for parsing and segmenting input texts. Whereas the MLB approach relies on a model trained from a corpus by using machine learning techniques. We compare between two algorithms from the DCB approach: longest-matching and maximal matching, and four algorithms from the MLB approach: Naive Bayes (NB), decision tree, support vector machine (SVM), and conditional random field (CRF). From the experimental results, the DCB approach yielded better performance than the NB, decision tree and SVM algorithms from the MLB approach. However, the best performance was obtained from the CRF algorithm with the precision and recall of 95.79% and 94.98%, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1