Publication | Open Access
Text segmentation using reiteration and collocation
23
Citations
15
References
1998
Year
Unknown Venue
EngineeringSemanticsCorpus LinguisticsText MiningNatural Language ProcessingApplied LinguisticsInformation RetrievalWord RepetitionText SegmentationComputational LinguisticsLanguage StudiesSubtopic AreasMachine TranslationDocument ClusteringComputational LexicologyTerminology ExtractionInformation ExtractionLexical Cohesion RelationsText ProcessingLinguistics
A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects.
| Year | Citations | |
|---|---|---|
Page 1
Page 1