Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

Yao Li; Sen Fang; Tao Zhang; Haipeng Cai

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

Yao Li, Sen Fang, Tao Zhang, Haipeng Cai

TL;DR

The paper analyzes how ChatGPT, as a non-decisional model, impacts Android malware detection compared with traditional decision-centric methods (Drebin, XMAL, MaMaDroid). It shows that while existing detectors excel on known data, they suffer from dataset bias and poor interpretability, whereas ChatGPT provides rich analyses and explanations but cannot make final decisions. Through feature extraction from APKs, prompt-based interactions, and human developer surveys, the study demonstrates a shift toward explanation-centric detection and highlights the potential of hybrid models that combine actionable decisions with in-depth analysis. The work advocates for integrating interpretability into malware pipelines and outlines plans to build a dedicated Android malware detection LLM, aiming to improve trust, usability, and robustness against unknown threats in practical deployments.

Abstract

With the rise of large language models, such as ChatGPT, non-decisional models have been applied to various tasks. Moreover, ChatGPT has drawn attention to the traditional decision-centric task of Android malware detection. Despite effective detection methods proposed by scholars, they face low interpretability issues. Specifically, while these methods excel in classifying applications as benign or malicious and can detect malicious behavior, they often fail to provide detailed explanations for the decisions they make. This challenge raises concerns about the reliability of existing detection schemes and questions their true ability to understand complex data. In this study, we investigate the influence of the non-decisional model, ChatGPT, on the traditional decision-centric task of Android malware detection. We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid, conduct a series of experiments on publicly available datasets, and carry out a comprehensive comparison and analysis. Our findings indicate that these decision-driven solutions primarily rely on statistical patterns within datasets to make decisions, rather than genuinely understanding the underlying data. In contrast, ChatGPT, as a non-decisional model, excels in providing comprehensive analysis reports, substantially enhancing interpretability. Furthermore, we conduct surveys among experienced developers. The result highlights developers' preference for ChatGPT, as it offers in-depth insights and enhances efficiency and understanding of challenges. Meanwhile, these studies and analyses offer profound insights, presenting developers with a novel perspective on Android malware detection--enhancing the reliability of detection results from a non-decisional perspective.

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

TL;DR

Abstract

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task

Authors

TL;DR

Abstract

Table of Contents

Figures (14)