Publication | Closed Access
The Enron Email Dataset Database Schema and Brief Statistical Report
307
Citations
0
References
2004
Year
Unknown Venue
Email logs are valuable for link, social network, and textual analysis, yet most studies rely on synthetic data; the Enron dataset serves as a real‑world benchmark similar to fraud and counter‑terrorism data. This report presents a MySQL database schema for the Enron dataset and evaluates its suitability for research. The authors constructed the database and extracted a 151‑person social network by linking employees who exchanged a threshold number of emails.
Email logs have been considered as a useful resource for research in fields like link analysis, social network analysis and textual analysis. Most of the experiments in these fields of research are performed on synthetic data due to lack of an adequate and real life benchmark. The Enron email dataset is a touchstone for such research. This dataset is very similar to the kind of the data collected for fraud detection and counter terrorism hence it is a perfect test bed for testing the effectiveness of techniques used for counter terrorism and fraud detection. In this report we describe the MySql database prepared for the dataset and also statistically analyze its appropriateness for research. We further derive a social network constituting of 151 employees from the email logs, by defining a social contact to be someone with whom an individual has exchanged a pre decided threshold number of emails.