Publication | Open Access
Segmenting email message text into zones
24
Citations
8
References
2009
Year
Unknown Venue
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingPattern RecognitionText RecognitionComputational LinguisticsText SegmentationLanguage StudiesCharacter RecognitionContent AnalysisBody TextSegment Email MessagesLinguisticsComputer ScienceInformation ExtractionReply ContentText ProcessingEmail Message TextDocument Processing
In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. Today, the explosion of different email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that simple techniques for identifying quoted replies that used to yield 95% accuracy now find less than 10% of such content. In this paper, we describe Zebra, an SVM-based system for segmenting the body text of email messages into nine zone types based on graphic, orthographic and lexical cues. Zebra performs this task with an accuracy of 87.01%; when the number of zones is abstracted to two or three zone classes, this increases to 93.60% and 91.53% respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1