Publication | Closed Access
Structural Clustering of Machine-Generated Mail
12
Citations
38
References
2016
Year
Unknown Venue
Cluster ComputingEngineeringCorpus LinguisticsStructural ClusteringText MiningNatural Language ProcessingSpam FilteringInformation RetrievalData ScienceData MiningMail ExtractionComputational LinguisticsDocument ClusteringKnowledge DiscoveryComputer ScienceInformation ExtractionHtml StructureStructure DiscoveryStructure MiningText ProcessingMail Search
Several recent studies have presented different approaches for clustering and classifying machine-generated mail based on email headers. We propose to expand these approaches by considering email message bodies. We argue that our approach can help increase coverage and precision in several tasks, and is especially critical for mail extraction. We remind that mail extraction supports a variety of mail mining applications such as ad re-targeting, mail search, and mail summarization. We introduce new structural clustering methods that leverage the HTML structure that is common to messages generated by a same mass-sender script. We discuss how such structural clustering can be conducted at different levels of granularity, using either strict or flexible matching constraints, depending on the use cases.
| Year | Citations | |
|---|---|---|
Page 1
Page 1