Publication | Closed Access
Malicious PDF detection using metadata and structural features
269
Citations
12
References
2012
Year
Unknown Venue
Abuse DetectionEngineeringMachine LearningEvasion TechniqueInformation SecurityInformation ForensicsText MiningInformation RetrievalData ScienceData MiningPattern RecognitionAdversarial Machine LearningMalware DetectionUnseen MalwareMalicious Pdf DetectionPdf DocumentsThreat DetectionKnowledge DiscoveryComputer ScienceAnti-virus TechniqueMalware Analysis
Owed to their versatile functionality and widespread adoption, PDF documents have become a popular avenue for user exploitation ranging from large-scale phishing attacks to targeted attacks. In this paper, we present a framework for robust detection of malicious documents through machine learning. Our approach is based on features extracted from document metadata and structure. Using real-world datasets, we demonstrate the the adequacy of these document properties for malware detection and the durability of these features across new malware variants. Our analysis shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware.
| Year | Citations | |
|---|---|---|
Page 1
Page 1