OBA2: An Onion approach to Binary code Authorship Attribution

Abstract

A critical aspect of malware forensics is authorship analysis. The successful outcome of\nsuch analysis is usually determined by the reverse engineer’s skills and by the volume and\ncomplexity of the code under analysis. To assist reverse engineers in such a tedious and\nerror-prone task, it is desirable to develop reliable and automated tools for supporting the\npractice of malware authorship attribution. In a recent work, machine learning was used to\nrank and select syntax-based features such as n-grams and flow graphs. The experimental\nresults showed that the top ranked features were unique for each author, which was\nregarded as an evidence that those features capture the author’s programming styles. In\nthis paper, however, we show that the uniqueness of features does not necessarily\ncorrespond to authorship. Specifically, our analysis demonstrates that many “unique”\nfeatures selected using this method are clearly unrelated to the authors’ programming\nstyles, for example, unique IDs or random but unique function names generated by the\ncompiler; furthermore, the overall accuracy is generally unsatisfactory. Motivated by this\ndiscovery, we propose a layered Onion Approach for Binary Authorship Attribution called\nOBA2. The novelty of our approach lies in the three complementary layers: preprocessing,\nsyntax-based attribution, and semantic-based attribution. Experiments show that our\nmethod produces results that not only are more accurate but have a meaningful connection\nto the authors’ styles.

References

Page 1

	Year	Citations

Page 1