Malware Detection at the Edge with Lightweight LLMs: A Performance Evaluation
Christian Rondanini, Barbara Carminati, Elena Ferrari, Antonio Gaudiano, Ashish Kundu
TL;DR
This work tackles malware detection on resource-constrained edge devices by evaluating lightweight LLMs (e.g., DistilBERT, TinyBERT, DistilGPT2, Llama 3.2 1B, TinyT5) in an edge-focused architecture. It conducts a comprehensive, multi-dataset, multi-model study across encoder-only, decoder-only, and encoder-decoder SLMs, comparing zero-shot, few-shot, and full fine-tuning approaches. Zero-shot performance is moderate ($0.72$ on average), while fine-tuned models reach high accuracy ($0.94$–$0.99$), with TinyBERT and TinyT5 offering favorable on-device trade-offs. Cross-dataset validation highlights domain shift challenges and the value of complete fine-tuning for generalization. On-device experiments show Jetson Nano generally outperforms Raspberry Pi 3 due to GPU acceleration, confirming the practicality of edge-based malware detection with lightweight LLMs and motivating centralized feedback to further improve detection across diverse environments.
Abstract
The rapid evolution of malware attacks calls for the development of innovative detection methods, especially in resource-constrained edge computing. Traditional detection techniques struggle to keep up with modern malware's sophistication and adaptability, prompting a shift towards advanced methodologies like those leveraging Large Language Models (LLMs) for enhanced malware detection. However, deploying LLMs for malware detection directly at edge devices raises several challenges, including ensuring accuracy in constrained environments and addressing edge devices' energy and computational limits. To tackle these challenges, this paper proposes an architecture leveraging lightweight LLMs' strengths while addressing limitations like reduced accuracy and insufficient computational power. To evaluate the effectiveness of the proposed lightweight LLM-based approach for edge computing, we perform an extensive experimental evaluation using several state-of-the-art lightweight LLMs. We test them with several publicly available datasets specifically designed for edge and IoT scenarios and different edge nodes with varying computational power and characteristics.
