Table of Contents
Fetching ...

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

Yao Li, Sen Fang, Tao Zhang, Haipeng Cai

TL;DR

The paper analyzes how ChatGPT, as a non-decisional model, impacts Android malware detection compared with traditional decision-centric methods (Drebin, XMAL, MaMaDroid). It shows that while existing detectors excel on known data, they suffer from dataset bias and poor interpretability, whereas ChatGPT provides rich analyses and explanations but cannot make final decisions. Through feature extraction from APKs, prompt-based interactions, and human developer surveys, the study demonstrates a shift toward explanation-centric detection and highlights the potential of hybrid models that combine actionable decisions with in-depth analysis. The work advocates for integrating interpretability into malware pipelines and outlines plans to build a dedicated Android malware detection LLM, aiming to improve trust, usability, and robustness against unknown threats in practical deployments.

Abstract

With the rise of large language models, such as ChatGPT, non-decisional models have been applied to various tasks. Moreover, ChatGPT has drawn attention to the traditional decision-centric task of Android malware detection. Despite effective detection methods proposed by scholars, they face low interpretability issues. Specifically, while these methods excel in classifying applications as benign or malicious and can detect malicious behavior, they often fail to provide detailed explanations for the decisions they make. This challenge raises concerns about the reliability of existing detection schemes and questions their true ability to understand complex data. In this study, we investigate the influence of the non-decisional model, ChatGPT, on the traditional decision-centric task of Android malware detection. We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid, conduct a series of experiments on publicly available datasets, and carry out a comprehensive comparison and analysis. Our findings indicate that these decision-driven solutions primarily rely on statistical patterns within datasets to make decisions, rather than genuinely understanding the underlying data. In contrast, ChatGPT, as a non-decisional model, excels in providing comprehensive analysis reports, substantially enhancing interpretability. Furthermore, we conduct surveys among experienced developers. The result highlights developers' preference for ChatGPT, as it offers in-depth insights and enhances efficiency and understanding of challenges. Meanwhile, these studies and analyses offer profound insights, presenting developers with a novel perspective on Android malware detection--enhancing the reliability of detection results from a non-decisional perspective.

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

TL;DR

The paper analyzes how ChatGPT, as a non-decisional model, impacts Android malware detection compared with traditional decision-centric methods (Drebin, XMAL, MaMaDroid). It shows that while existing detectors excel on known data, they suffer from dataset bias and poor interpretability, whereas ChatGPT provides rich analyses and explanations but cannot make final decisions. Through feature extraction from APKs, prompt-based interactions, and human developer surveys, the study demonstrates a shift toward explanation-centric detection and highlights the potential of hybrid models that combine actionable decisions with in-depth analysis. The work advocates for integrating interpretability into malware pipelines and outlines plans to build a dedicated Android malware detection LLM, aiming to improve trust, usability, and robustness against unknown threats in practical deployments.

Abstract

With the rise of large language models, such as ChatGPT, non-decisional models have been applied to various tasks. Moreover, ChatGPT has drawn attention to the traditional decision-centric task of Android malware detection. Despite effective detection methods proposed by scholars, they face low interpretability issues. Specifically, while these methods excel in classifying applications as benign or malicious and can detect malicious behavior, they often fail to provide detailed explanations for the decisions they make. This challenge raises concerns about the reliability of existing detection schemes and questions their true ability to understand complex data. In this study, we investigate the influence of the non-decisional model, ChatGPT, on the traditional decision-centric task of Android malware detection. We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid, conduct a series of experiments on publicly available datasets, and carry out a comprehensive comparison and analysis. Our findings indicate that these decision-driven solutions primarily rely on statistical patterns within datasets to make decisions, rather than genuinely understanding the underlying data. In contrast, ChatGPT, as a non-decisional model, excels in providing comprehensive analysis reports, substantially enhancing interpretability. Furthermore, we conduct surveys among experienced developers. The result highlights developers' preference for ChatGPT, as it offers in-depth insights and enhances efficiency and understanding of challenges. Meanwhile, these studies and analyses offer profound insights, presenting developers with a novel perspective on Android malware detection--enhancing the reliability of detection results from a non-decisional perspective.
Paper Structure (28 sections, 4 equations, 14 figures, 5 tables)

This paper contains 28 sections, 4 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: SHA256 for "feb1b1019a31d0677c1d25bb80549a171c5f7da381f38014aae0d63b56126722", an unidentified Android software reported on MalwareBazaar.
  • Figure 2: Overview of our study.
  • Figure 3: The initial guidance and outcome presentation from ChatGPT.
  • Figure 4: Input features, with each feature separated by a space.
  • Figure 5: Analysis of system calls used by application.
  • ...and 9 more figures