Table of Contents
Fetching ...

Coding Malware in Fancy Programming Languages for Fun and Profit

Theodoros Apostolopoulos, Vasilios Koutsokostas, Nikolaos Totosis, Constantinos Patsakis, Georgios Smaragdakis

TL;DR

This work addresses the vulnerability of static malware detection to cross-language coding practices by empirically evaluating how programming languages and compilers influence detectability and reverse-engineering difficulty. Using public datasets (Malware Bazaar, VirusTotal) and a controlled experiment across 39 languages and 50 compilers with non-obfuscated payloads, it demonstrates dramatic variation in detection rates and reveals substantial increases in analysis complexity for runtime-heavy or highly abstracted languages like Haskell. Key findings show that shellcode fragmentation and indirect control-flow patterns correlate with reduced detectability, and that cross-language compilation can broaden attacker capabilities across platforms. The study highlights the need for language- and compiler-aware defense tools and suggests that future work should extend analysis to additional languages and tooling to improve robust malware detection and analyst productivity.

Abstract

The continuous increase in malware samples, both in sophistication and number, presents many challenges for organizations and analysts, who must cope with thousands of new heterogeneous samples daily. This requires robust methods to quickly determine whether a file is malicious. Due to its speed and efficiency, static analysis is the first line of defense. In this work, we illustrate how the practical state-of-the-art methods used by antivirus solutions may fail to detect evident malware traces. The reason is that they highly depend on very strict signatures where minor deviations prevent them from detecting shellcodes that otherwise would immediately be flagged as malicious. Thus, our findings illustrate that malware authors may drastically decrease the detections by converting the code base to less-used programming languages. To this end, we study the features that such programming languages introduce in executables and the practical issues that arise for practitioners to detect malicious activity.

Coding Malware in Fancy Programming Languages for Fun and Profit

TL;DR

This work addresses the vulnerability of static malware detection to cross-language coding practices by empirically evaluating how programming languages and compilers influence detectability and reverse-engineering difficulty. Using public datasets (Malware Bazaar, VirusTotal) and a controlled experiment across 39 languages and 50 compilers with non-obfuscated payloads, it demonstrates dramatic variation in detection rates and reveals substantial increases in analysis complexity for runtime-heavy or highly abstracted languages like Haskell. Key findings show that shellcode fragmentation and indirect control-flow patterns correlate with reduced detectability, and that cross-language compilation can broaden attacker capabilities across platforms. The study highlights the need for language- and compiler-aware defense tools and suggests that future work should extend analysis to additional languages and tooling to improve robust malware detection and analyst productivity.

Abstract

The continuous increase in malware samples, both in sophistication and number, presents many challenges for organizations and analysts, who must cope with thousands of new heterogeneous samples daily. This requires robust methods to quickly determine whether a file is malicious. Due to its speed and efficiency, static analysis is the first line of defense. In this work, we illustrate how the practical state-of-the-art methods used by antivirus solutions may fail to detect evident malware traces. The reason is that they highly depend on very strict signatures where minor deviations prevent them from detecting shellcodes that otherwise would immediately be flagged as malicious. Thus, our findings illustrate that malware authors may drastically decrease the detections by converting the code base to less-used programming languages. To this end, we study the features that such programming languages introduce in executables and the practical issues that arise for practitioners to detect malicious activity.

Paper Structure

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Distribution of the top 5 programming languages of samples per year.
  • Figure 2: Deviations in the detection rate per programming language.
  • Figure 3: Distribution of top programming languages of APT samples per year from gonzalez2023technical.
  • Figure 4: Barplots to illustrate the variation of shellcode samples.