ERNN: Error-Resilient RNN for Encrypted Traffic Detection towards Network-Induced Phenomena

Abstract

Traffic detection systems based on machine learning have been proposed to defend against cybersecurity threats, such as intrusion attacks and malware. However, they did not take the impact of network-induced phenomena into consideration, such as packet loss, retransmission, and out-of-order. These phenomena will introduce additional misclassifications in the real world. In this paper, we present <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> , a robust and end-to-end RNN model that is specially designed against network-induced phenomena. As its core, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> is designed with a novel gating unit named as session gate that includes: (i) four types of actions to simulate common network-induced phenomena during model training; and (ii) the Mealy machine to update states of session gate that adjusts the probability distribution of network-induced phenomena. Taken together, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> advances state-of-the-art by realizing the model robustness for network-induced phenomena in an error-resilient manner. We implement <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> and evaluate it extensively on both intrusion detection and malware detection systems. By practical evaluation with dynamic bandwidth utilization and different network topologies, we demonstrate that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> can still identify 98.63% of encrypted intrusion traffic when facing about 16% abnormal packet sequences on a 10 Gbps dataplane. Similarly, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> can still robustly identify more than 97% of the encrypted malware traffic in multi-user concurrency scenarios. <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> can realize <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula> 4% accuracy more than SOTA methods. Based on the Integrated Gradients method, we interpret the gating mechanism can reduce the dependencies on local packets (termed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">dependency dispersion</i> ). Moreover, we demonstrate that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math>${\sf ERNN}$</tex-math></inline-formula> possesses superior stability and scalability in terms of parameter settings and feature selection.