Table of Contents
Fetching ...

Enhancing Malware Fingerprinting through Analysis of Evasive Techniques

Alsharif Abuadbba, Sean Lamont, Ejaz Ahmed, Cody Christopher, Muhammad Ikram, Uday Tupakula, Daniel Coscia, Mohamed Ali Kaafar, Surya Nepal

TL;DR

This work investigates the limitations of traditional file-level malware fingerprinting by conducting a large-scale empirical study on Windows PE samples from VirusTotal. It demonstrates that cryptographic and fuzzy hashes miss most similar variants, while invariant components such as import libraries and executable sections reveal strong cross-sample similarities. The authors introduce resilient fingerprints and two clustering approaches, Top-Down and Bottom-Up, to connect surface-level variations to core invariant parts, achieving up to a 3-fold improvement in identifying similar malware and surpassing traditional methods by more than $0.50$ in identification rate. The findings offer practical guidance for strengthening early triage and triage workflows against evolving evasion tactics.

Abstract

As malware detection evolves, attackers adopt sophisticated evasion tactics. Traditional file-level fingerprinting, such as cryptographic and fuzzy hashes, is often overlooked as a target for evasion. Malware variants exploit minor binary modifications to bypass detection, as seen in Microsoft's discovery of GoldMax variations (2020-2021). However, no large-scale empirical studies have assessed the limitations of traditional fingerprinting methods on real-world malware samples or explored improvements. This paper fills this gap by addressing three key questions: (a) How prevalent are file variants in malware samples? Analyzing 4 million Windows Portable Executable (PE) files, 21 million sections, and 48 million resources, we find up to 80% deep structural similarities, including common APIs and executable sections. (b) What evasion techniques are used? We identify resilient fingerprints (clusters of malware variants with high similarity) validated via VirusTotal. Our analysis reveals non-functional mutations, such as altered section numbers, virtual sizes, and section names, as primary evasion tactics. We also classify two key section types: malicious sections (high entropy >5) and camouflage sections (entropy = 0). (c) How can fingerprinting be improved? We propose two novel approaches that enhance detection, improving identification rates from 20% (traditional methods) to over 50% using our refined fingerprinting techniques. Our findings highlight the limitations of existing methods and propose new strategies to strengthen malware fingerprinting against evolving threats.

Enhancing Malware Fingerprinting through Analysis of Evasive Techniques

TL;DR

This work investigates the limitations of traditional file-level malware fingerprinting by conducting a large-scale empirical study on Windows PE samples from VirusTotal. It demonstrates that cryptographic and fuzzy hashes miss most similar variants, while invariant components such as import libraries and executable sections reveal strong cross-sample similarities. The authors introduce resilient fingerprints and two clustering approaches, Top-Down and Bottom-Up, to connect surface-level variations to core invariant parts, achieving up to a 3-fold improvement in identifying similar malware and surpassing traditional methods by more than in identification rate. The findings offer practical guidance for strengthening early triage and triage workflows against evolving evasion tactics.

Abstract

As malware detection evolves, attackers adopt sophisticated evasion tactics. Traditional file-level fingerprinting, such as cryptographic and fuzzy hashes, is often overlooked as a target for evasion. Malware variants exploit minor binary modifications to bypass detection, as seen in Microsoft's discovery of GoldMax variations (2020-2021). However, no large-scale empirical studies have assessed the limitations of traditional fingerprinting methods on real-world malware samples or explored improvements. This paper fills this gap by addressing three key questions: (a) How prevalent are file variants in malware samples? Analyzing 4 million Windows Portable Executable (PE) files, 21 million sections, and 48 million resources, we find up to 80% deep structural similarities, including common APIs and executable sections. (b) What evasion techniques are used? We identify resilient fingerprints (clusters of malware variants with high similarity) validated via VirusTotal. Our analysis reveals non-functional mutations, such as altered section numbers, virtual sizes, and section names, as primary evasion tactics. We also classify two key section types: malicious sections (high entropy >5) and camouflage sections (entropy = 0). (c) How can fingerprinting be improved? We propose two novel approaches that enhance detection, improving identification rates from 20% (traditional methods) to over 50% using our refined fingerprinting techniques. Our findings highlight the limitations of existing methods and propose new strategies to strengthen malware fingerprinting against evolving threats.

Paper Structure

This paper contains 24 sections, 6 figures, 9 tables, 2 algorithms.

Figures (6)

  • Figure 1: VirusTotal report granularity per file.
  • Figure 2: VirusTotal Feeds APIs and interaction with third-party security vendors overview.
  • Figure 3: Import List -- each pair of boxes shows the imported library and a list of functions from that library.
  • Figure 4: File Prevalence.
  • Figure 5: Section Numbers, Virtual Size and Address variant.
  • ...and 1 more figures