Publication | Closed Access
A semantics aware approach to automated reverse engineering unknown protocols
136
Citations
30
References
2012
Year
Unknown Venue
Network TraceEngineeringInformation SecurityVerificationSoftware EngineeringInformation ForensicsReverse EngineeringSemantic WebSoftware AnalysisFormal VerificationHardware SecurityData ScienceProtocol MessageSystems EngineeringFormal TechniqueSecure ProtocolInteraction ProtocolLightweight ProtocolFormal SpecificationRuntime VerificationSemantics Aware ApproachComputer EngineeringComputer ScienceNetwork ForensicsSoftware DesignData SecurityCryptographyNetwork Communication ProtocolAutomated ReasoningProgram AnalysisMessage Format SpecificationsFormal MethodsTransport Layer
Extracting protocol message format specifications from network traces is crucial for parsing, vulnerability discovery, and system integration. This work introduces ProDecoder, a trace‑based inference system that leverages message semantics without requiring executable code. ProDecoder exploits the highly skewed frequency distribution of n‑grams, groups messages sharing semantics, and infers formats through keyword‑based clustering and cluster‑sequence alignment, and was implemented for SMB and SMTP. Experiments show ProDecoder achieves 100 % precision and recall on SMB and about 95 % on SMTP.
Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.
| Year | Citations | |
|---|---|---|
Page 1
Page 1