Publication | Closed Access
MCRe: A Unified Framework for Handling Malicious Traffic With Noise Labels Based on Multidimensional Constraint Representation
19
Citations
23
References
2023
Year
Due to the limitations of the existing annotation methods, the prevalence of label noise can be caused in realistic malicious traffic datasets, which has a significant impact on the training and evaluation of deep learning-based intrusion detection models. Recently, various methods have been proposed to deal with noise-containing labeled datasets, and they can be roughly divided into two categories: data cleaning and robust training. However, the different processing ideas lead these two types of methods to ignore the information in different components of the dataset, resulting in a cliff-like drop in performance under high noise conditions. To this end, this study proposes a unified framework for handling noise malicious traffic based on the multidimensional constrained representations named MCRe, which unifies data cleaning and robust training into an ideal representation function approximation. According to the properties of the ideal representation function, information integrity constraints, cluster separability constraints and core proximity constraints are defined to drive MCRe to approximate the ideal representation during iteration. These constraints led MCRe to learn the individual, intra-class, and global levels of distributed knowledge, thus avoiding irrational domain knowledge extraction and ensuring strong label noise robustness of the representation network. We validated MCRe on a dataset that includes 22 types of realistic malicious traffic. Experimental results show that MCRe can outperform the state-of-the-art methods in both data cleaning and robust training downstream tasks, achieving 85% pure sample rate and 82% classification accuracy even under the condition of up to 90% noise labels. In addition, the generalizability of MCRe was verified on several public datasets. Finally, MCRe was also well-extended to enhance other data cleaning and robust training approaches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1