Enabling Obfuscation Detection in Binary Software through eXplainable AI

Abstract

Binary obfuscation techniques are commonly employed to protect code against reverse engineering and piracy. Unfortunately, besides being used for legitimate purposes, virus writers also resort to obfuscation to evade antivirus detection mechanisms based on signature scanning. Consequently, the detection of obfuscated code in executables may be a precious resource to prevent the execution of malicious programs. Detecting obfuscation is a task fraught with difficulties owing to the wide range of possible obfuscation transformations and the indistinguishability of obfuscated code. In this paper, we venture into the not-so-explored world of obfuscation detection, gaining a deeper comprehension of what happens - from a statistical perspective - to a binary program when obfuscation transformations are applied to it. We accomplish this goal by leveraging eXplainable Artificial Intelligence, which allows us to discern the altered features from the invariant ones, which in turn can then be used for obfuscation-resilient malware detection. The present study has been carried out utilizing diverse datasets, not only to examine the detection of obfuscation but also to classify the specific obfuscating transformations employed. The investigation encompasses binaries compiled for various architectures, and we propose an effective methodology for identifying both the existence of obfuscation and isolating invariant patterns that can facilitate the creation of obfuscation-resistant signatures for antivirus detection.

Publication
IEEE Transactions on Emerging Topics in Computing