Publication | Closed Access
Annotating Subsets of the Enron Email Corpus
20
Citations
2
References
2006
Year
Unknown Venue
We present an annotation project for two subsets of the Enron email corpus. The first is a subset of the UC Berkeley Enron Email Analysis Project and the second consists of a portion of emails from the Voice Transcripts Email Correlated Corpora. Parts of the automatic content extraction (ACE) annotation guidelines, extended for the email domain are used for annotation. We also categorize the emails with email speech acts, mark whether the text contains discussions of meetings/conversations, and determine the degree of correlation of the subject line with the text body. 1. CORPUS CREATION The purpose of this project was to create an annotated corpus that could be used for further email research. In 2003, the Federal Energy Regulatory Commission (FERC) as a result of its investigation of Enron's energy trading practices [3] made available to the public the Enron email corpus. We chose to use two subsets of this corpus. The first is a subset of the UC Berkeley Enron Email Analysis Project (BEEAP)
| Year | Citations | |
|---|---|---|
Page 1
Page 1