Table of Contents
Fetching ...

PromptSAM+: Malware Detection based on Prompt Segment Anything Model

Xingyuan Wei, Yichen Liu, Ce Li, Ning Li, Degang Sun, Yan Wang

TL;DR

PromptSAM+ introduces a universal framework that enriches image-based malware detection with semantic information from the Segment Anything Model (SAM). By converting Android DEX and Windows PE binaries into images and inserting Learnable Prompted Embeddings into SAM's encoder, PromptSAM+ achieves high detection accuracy and strong malware-family classification across Android and Windows while mitigating model aging under concept drift. Extensive experiments on 155k samples show substantial improvements over state-of-the-art methods and demonstrate drift resistance, with ablation analyses confirming the pivotal role of SAM-derived semantics. The approach provides an insertable module that can enhance existing visual malware detectors and offers practical benefits for large-scale, cross-platform threat analysis.

Abstract

Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the evolution of malware, the performance of classifiers significantly declines over time, greatly reducing the practicality of malware detectors. (3) Prior ML/DL-based efforts heavily rely on ample labeled data for model training, largely dependent on feature engineering or domain knowledge to build feature databases, making them vulnerable if correct labels are scarce. With the development of computer vision, vision-based malware detection technology has also rapidly evolved. In this paper, we propose a visual malware general enhancement classification framework, `PromptSAM+', based on a large visual network segmentation model, the Prompt Segment Anything Model(named PromptSAM+). Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives. The proposed method outperforms the most advanced image-based malware detection technologies on several datasets. 'PromptSAM+' can mitigate aging in existing image-based malware classifiers, reducing the considerable manpower needed for labeling new malware samples through active learning. We conducted experiments on datasets for both Windows and Android platforms, achieving favorable outcomes. Additionally, our ablation experiments on several datasets demonstrate that our model identifies effective modules within the large visual network.

PromptSAM+: Malware Detection based on Prompt Segment Anything Model

TL;DR

PromptSAM+ introduces a universal framework that enriches image-based malware detection with semantic information from the Segment Anything Model (SAM). By converting Android DEX and Windows PE binaries into images and inserting Learnable Prompted Embeddings into SAM's encoder, PromptSAM+ achieves high detection accuracy and strong malware-family classification across Android and Windows while mitigating model aging under concept drift. Extensive experiments on 155k samples show substantial improvements over state-of-the-art methods and demonstrate drift resistance, with ablation analyses confirming the pivotal role of SAM-derived semantics. The approach provides an insertable module that can enhance existing visual malware detectors and offers practical benefits for large-scale, cross-platform threat analysis.

Abstract

Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the evolution of malware, the performance of classifiers significantly declines over time, greatly reducing the practicality of malware detectors. (3) Prior ML/DL-based efforts heavily rely on ample labeled data for model training, largely dependent on feature engineering or domain knowledge to build feature databases, making them vulnerable if correct labels are scarce. With the development of computer vision, vision-based malware detection technology has also rapidly evolved. In this paper, we propose a visual malware general enhancement classification framework, `PromptSAM+', based on a large visual network segmentation model, the Prompt Segment Anything Model(named PromptSAM+). Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives. The proposed method outperforms the most advanced image-based malware detection technologies on several datasets. 'PromptSAM+' can mitigate aging in existing image-based malware classifiers, reducing the considerable manpower needed for labeling new malware samples through active learning. We conducted experiments on datasets for both Windows and Android platforms, achieving favorable outcomes. Additionally, our ablation experiments on several datasets demonstrate that our model identifies effective modules within the large visual network.
Paper Structure (23 sections, 12 equations, 10 figures, 12 tables, 1 algorithm)

This paper contains 23 sections, 12 equations, 10 figures, 12 tables, 1 algorithm.

Figures (10)

  • Figure 1: The overflow of Segment Anything Model
  • Figure 2: The overflow of 'PromptSAM+' System
  • Figure 3: The Overview Of Dex to Image. Left: Android DEX file structure, There are three main parts:(1)Header,2(Ids),(3)Data. Right: binary image representation of the Dex file.
  • Figure 4: Multiple dex files are decomposed, combined, and merged into one complete dex file.
  • Figure 5: The Overflow Of The 'PromptSAM+' Structure
  • ...and 5 more figures