A deep learning approach for detecting malicious JavaScript code

TLDR

Malicious JavaScript embedded in webpages poses a growing security threat, yet its obfuscation and complexity make detection costly, and existing shallow machine‑learning methods struggle to meet the demands of the big‑data era. This study introduces a deep‑learning framework for detecting malicious JavaScript code, aiming to surpass the accuracy of existing shallow‑learning approaches. The framework combines a sparse random projection, stacked denoising auto‑encoders for high‑level feature extraction, and a logistic‑regression classifier to distinguish malicious from benign scripts. On a dataset of more than 27,000 labeled samples, the model achieved up to 95 % accuracy with a false‑positive rate below 4.2 %. © 2016 John Wiley & Sons, Ltd.

Abstract

Abstract Malicious JavaScript code in webpages on the Internet is an emergent security issue because of its universality and potentially severe impact. Because of its obfuscation and complexities, detecting it has a considerable cost. Over the last few years, several machine learning‐based detection approaches have been proposed; most of them use shallow discriminating models with features that are constructed with artificial rules. However, with the advent of the big data era for information transmission, these existing methods already cannot satisfy actual needs. In this paper, we present a new deep learning framework for detection of malicious JavaScript code, from which we obtained the highest detection accuracy compared with the control group. The architecture is composed of a sparse random projection, deep learning model, and logistic regression. Stacked denoising auto‐encoders were used to extract high‐level features from JavaScript code; logistic regression as a classifier was used to distinguish between malicious and benign JavaScript code. Experimental results indicated that our architecture, with over 27 000 labeled samples, can achieve an accuracy of up to 95%, with a false positive rate less than 4.2% in the best case. Copyright © 2016 John Wiley & Sons, Ltd.

References

Page 1

	Year	Citations

Page 1