Publication | Closed Access
Learning to Extract Signature and Reply Lines from Email.
81
Citations
14
References
2004
Year
EngineeringMachine LearningEmail MessageReply LinesText MiningSpeech RecognitionNatural Language ProcessingSpam FilteringData SciencePattern RecognitionText RecognitionComputational LinguisticsMachine TranslationSequence ModellingAutomatic ClassificationKnowledge DiscoveryComputer ScienceInformation ExtractionSignature BlockData ExtractionText Processing
We describe methods for automatically identifying signature blocks and reply lines in plain-text email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems; anonymization of email corpora; improving automatic content-based mail classifiers; and email threading. Our method is based on applying machine learning methods to a sequential representation of an email message, in which each email is represented as a sequence of lines, and each line is represented as a set of features. We compare several state-of-the-art sequential and non-sequential machine learning algorithms on different feature sets, and present experimental results showing that the presence of a signature block in a message can be detected with accuracy higher than 97%; that signature block lines can be identified with accuracy higher than 99%; and that signature block and reply lines can be simultaneously identified with accuracy of higher than 98%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1