Fileprint analysis for Malware Detection 1

Abstract

Malcode can be easily hidden in document files and embedded in application executables. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS Anti-Virus (AV) scanner. In the case of zero-day malicious exploit code, signature-based AV scanners would fail to detect such malcode even if the scanner knew where to look. We propose the use of statistical binary content analysis of files in order to detect suspicious anomalous file segments that may suggest infection by malcode. Experiments are performed to determine whether the approach of n-gram analysis may provide useful evidence of an infected file that would subsequently be subjected to further scrutiny. Our goal is to develop an efficient means of detecting suspect infected files for application to online network communication or scanning a large store of collected information, such as a data warehouse of shared documents. 1. Introduction Attackers have used a variety of ways of embedding malicious code in otherwise normal appearing files to infect systems. Viruses that attach themselves to system files, or normal appearing media files, are nothing new. State-of-the-art COTS products scan and apply signature analysis to detect these known malware. For various performance optimization reasons, however, COTS Anti-Virus (AV) scanners may not perform a deep scan of all files in order to detect known malcodes that may have been embedded in an arbitrary file location. Other means of stealth to avoid detection are well known. Various self-encryption or code obfuscation techniques may be used to avoid detection simply making the content of malcode unavailable for inspection by an AV scanner. In the case of new “zero day” malicious exploit code, signature-based AV scanners would fail to detect such malcode even if the scanner had access to the content and knew where to look. In this paper we explore the use of statistical content analysis of files in order to detect anomalous file segments that may suggest infection by malcode. Our goal is to develop an efficient means of detecting suspect infected files for application to online network communication such as file sharing or media streaming, or scanning a large store of collected information, such as a data warehouse of acquired content. The first contribution of this paper is the astonishing observation that anti-virus systems can be easily deceived even given a signature for the hidden malcode. In our experiments, we simply inserted known malcode into normal PDF or DOC files. Although all these malcodes can be captured by the anti-virus system if they appear as stand alone files, quite a few poisoned PDF and DOC files carrying the malcode inside were not flagged by a popular COTS AV scanner. Furthermore, some of these were successfully opened by Adobe or Word. Thus, the file formats and application logic provides a ready made means of stealthily infecting a host with innocent appearing infected files. This implies sandboxing techniques to determine whether files are infected or fail in their execution would not be effective detectors in all cases. We also note that an existing known vulnerability of certain windows executables [23] remains available for malware insertion while avoiding detection. We demonstrate a simple case of embedding malcode into the block padding portion of MS WINWORD.EXE creating an infected application that operates correctly as the original executable. This provides a stealthy means of

References

Page 1

	Year	Citations

Page 1