Concepedia

Publication | Open Access

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

50

Citations

54

References

2019

Year

Abstract

Given a closed-source program, such as most of proprietary software and viruses, binary code analysis is indispensable for many tasks, such as code plagiarism detection and malware analysis. Today, source code is very often compiled for various architectures, making cross-architecture binary code analysis increasingly important. A binary, after being disassembled, is expressed in an assembly language. Thus, recent work starts exploring Natural Language Processing (NLP) inspired binary code analysis. In NLP, words are usually represented in high-dimensional vectors (i.e., embeddings) to facilitate further processing, which is one of the most common and critical steps in many NLP tasks. We regard instructions as words in NLPinspired binary code analysis, and aim to represent instructions as embeddings as well.

References

YearCitations

Page 1