Coding Malware in Fancy Programming Languages for Fun and Profit
Theodoros Apostolopoulos, Vasilios Koutsokostas, Nikolaos Totosis, Constantinos Patsakis, Georgios Smaragdakis
TL;DR
This work addresses the vulnerability of static malware detection to cross-language coding practices by empirically evaluating how programming languages and compilers influence detectability and reverse-engineering difficulty. Using public datasets (Malware Bazaar, VirusTotal) and a controlled experiment across 39 languages and 50 compilers with non-obfuscated payloads, it demonstrates dramatic variation in detection rates and reveals substantial increases in analysis complexity for runtime-heavy or highly abstracted languages like Haskell. Key findings show that shellcode fragmentation and indirect control-flow patterns correlate with reduced detectability, and that cross-language compilation can broaden attacker capabilities across platforms. The study highlights the need for language- and compiler-aware defense tools and suggests that future work should extend analysis to additional languages and tooling to improve robust malware detection and analyst productivity.
Abstract
The continuous increase in malware samples, both in sophistication and number, presents many challenges for organizations and analysts, who must cope with thousands of new heterogeneous samples daily. This requires robust methods to quickly determine whether a file is malicious. Due to its speed and efficiency, static analysis is the first line of defense. In this work, we illustrate how the practical state-of-the-art methods used by antivirus solutions may fail to detect evident malware traces. The reason is that they highly depend on very strict signatures where minor deviations prevent them from detecting shellcodes that otherwise would immediately be flagged as malicious. Thus, our findings illustrate that malware authors may drastically decrease the detections by converting the code base to less-used programming languages. To this end, we study the features that such programming languages introduce in executables and the practical issues that arise for practitioners to detect malicious activity.
