LAMD: Context-driven Android Malware Detection and Classification with LLMs
Xingzhi Qian, Xinran Zheng, Yiling He, Shuo Yang, Lorenzo Cavallaro
TL;DR
LAMD presents a context-driven framework for Android malware detection that overcomes LLM context limits and structural complexity by extracting key contextual signals and applying tier-wise code reasoning. It couples static analysis-derived key context with backward slicing to generate compact, semantically rich representations fed into a three-tier LLM reasoning pipeline, guarded by a factual consistency verifier using Data Relationship Coverage. Across real-world datasets exhibiting distribution drift, LAMD outperforms conventional detectors in detection accuracy and provides interpretable explanations, albeit with higher computational cost. The work demonstrates the feasibility of scalable, explainable LLM-powered malware analysis and outlines future directions for hybridizing LLMs with traditional detectors to handle evolving Android threats. Overall, LAMD advances practical AI-driven malware analysis by balancing zero-shot reasoning with rigorous context-aware verification, enabling robust, interpretable defenses in dynamic mobile threat landscapes.
Abstract
The rapid growth of mobile applications has escalated Android malware threats. Although there are numerous detection methods, they often struggle with evolving attacks, dataset biases, and limited explainability. Large Language Models (LLMs) offer a promising alternative with their zero-shot inference and reasoning capabilities. However, applying LLMs to Android malware detection presents two key challenges: (1)the extensive support code in Android applications, often spanning thousands of classes, exceeds LLMs' context limits and obscures malicious behavior within benign functionality; (2)the structural complexity and interdependencies of Android applications surpass LLMs' sequence-based reasoning, fragmenting code analysis and hindering malicious intent inference. To address these challenges, we propose LAMD, a practical context-driven framework to enable LLM-based Android malware detection. LAMD integrates key context extraction to isolate security-critical code regions and construct program structures, then applies tier-wise code reasoning to analyze application behavior progressively, from low-level instructions to high-level semantics, providing final prediction and explanation. A well-designed factual consistency verification mechanism is equipped to mitigate LLM hallucinations from the first tier. Evaluation in real-world settings demonstrates LAMD's effectiveness over conventional detectors, establishing a feasible basis for LLM-driven malware analysis in dynamic threat landscapes.
