Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

TLDR

Analysis of an organization’s computer network activity is essential for early detection of insider threat, and raw system logs generate streaming data that can exceed human cognitive capacity. We present an online unsupervised deep learning approach to detect anomalous network activity from system logs in real time. Our models decompose anomaly scores into the contributions of individual user behavior features to provide interpretability for analysts reviewing potential insider threat cases. Evaluated on the CERT Insider Threat Dataset v6.2 with recall as the metric, our deep and recurrent neural network models outperform PCA, SVM, and Isolation Forest baselines, and the best model assigns insider threat events an average anomaly score in the 95.53 percentile, indicating a strong potential to reduce analyst workloads.

Abstract

Analysis of an organization's computer network activity is a key component of early detection and mitigation of insider threat, a growing concern for many organizations. Raw system logs are a prototypical example of streaming data that can quickly scale beyond the cognitive power of a human analyst. As a prospective filter for the human analyst, we present an online unsupervised deep learning approach to detect anomalous network activity from system logs in real time. Our models decompose anomaly scores into the contributions of individual user behavior features for increased interpretability to aid analysts reviewing potential cases of insider threat. Using the CERT Insider Threat Dataset v6.2 and threat detection recall as our performance metric, our novel deep and recurrent neural network models outperform Principal Component Analysis, Support Vector Machine and Isolation Forest based anomaly detection baselines. For our best model, the events labeled as insider threat activity in our dataset had an average anomaly score in the 95.53 percentile, demonstrating our approach's potential to greatly reduce analyst workloads.