Publication | Closed Access
Research on the Construction and Filter Method of Stop-word List in Text Preprocessing
57
Citations
9
References
2011
Year
Stop-word ListEngineeringPart-of-speech TaggingCorpus LinguisticsText MiningText PreprocessingNatural Language ProcessingSyntaxInformation RetrievalData MiningText SegmentationComputational LinguisticsWord Segmentation (Natural Language Processing)GrammarLanguage StudiesWord Segmentation (Phonological Awareness)Knowledge DiscoveryTerminology ExtractionInformation ExtractionStop-word FilterKeyword ExtractionText ProcessingLinguisticsFilter Method
In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest.
| Year | Citations | |
|---|---|---|
Page 1
Page 1