Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection

TLDR

Networked systems face escalating attacks, making intrusion detection—especially machine‑learning‑based IDS—critical for protecting individuals, enterprises, and governments. The study applies Auto‑Encoder and PCA dimensionality reduction, introduces a combined performance metric, and develops a balancing approach to evaluate and improve IDS classifiers. Reduced‑dimensional features from AE and PCA are fed into Random Forest, Bayesian Network, LDA, and QDA classifiers to construct the IDS. Using 10 features instead of 81, the IDS achieved 99.6 % accuracy with superior detection rate, F‑measure, and low false alarm rate in both binary and multi‑class tests.

Abstract

The security of networked systems has become a critical universal issue that influences individuals, enterprises and governments. The rate of attacks against networked systems has increased dramatically, and the tactics used by the attackers are continuing to evolve. Intrusion detection is one of the solutions against these attacks. A common and effective approach for designing Intrusion Detection Systems (IDS) is Machine Learning. The performance of an IDS is significantly improved when the features are more discriminative and representative. This study uses two feature dimensionality reduction approaches: (i) Auto-Encoder (AE): an instance of deep learning, for dimensionality reduction, and (ii) Principle Component Analysis (PCA). The resulting low-dimensional features from both techniques are then used to build various classifiers such as Random Forest (RF), Bayesian Network, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for designing an IDS. The experimental findings with low-dimensional features in binary and multi-class classification show better performance in terms of Detection Rate (DR), F-Measure, False Alarm Rate (FAR), and Accuracy. This research effort is able to reduce the CICIDS2017 dataset’s feature dimensions from 81 to 10, while maintaining a high accuracy of 99.6% in multi-class and binary classification. Furthermore, in this paper, we propose a Multi-Class Combined performance metric C o m b i n e d M c with respect to class distribution to compare various multi-class and binary classification systems through incorporating FAR, DR, Accuracy, and class distribution parameters. In addition, we developed a uniform distribution based balancing approach to handle the imbalanced distribution of the minority class instances in the CICIDS2017 network intrusion dataset.

References

Page 1

	Year	Citations

Page 1