Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning

Stephen C. Gravereaux; Sheikh Rabiul Islam

Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning

Stephen C. Gravereaux, Sheikh Rabiul Islam

TL;DR

The paper investigates whether Low-Rank Adaptation (LoRA) can approximate full-parameter fine-tuning for LLM-based malware explanations grounded in SHAP features from EMBER. A standardized evaluation framework using BLEU, ROUGE, and semantic similarity compares five LoRA configurations against a full-finetuned baseline on 1,050 EMBER-derived samples. Full fine-tuning generally achieves the highest explanation quality, but mid-range LoRA (~15.5% trainable parameters) delivers competitive results with substantial reductions in model size and training time, enabling deployment in resource-constrained settings. The findings guide when to employ LoRA versus full fine-tuning and point to scaling experiments with larger LLMs and datasets to further optimize interpretability and efficiency in malware detection systems.

Abstract

This study examines whether Low-Rank Adaptation (LoRA) fine-tuned Large Language Models (LLMs) can approximate the performance of fully fine-tuned models in generating human-interpretable decisions and explanations for malware classification. Achieving trustworthy malware detection, particularly when LLMs are involved, remains a significant challenge. We developed an evaluation framework using Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and Semantic Similarity Metrics to benchmark explanation quality across five LoRA configurations and a fully fine-tuned baseline. Results indicate that full fine-tuning achieves the highest overall scores, with BLEU and ROUGE improvements of up to 10% over LoRA variants. However, mid-range LoRA models deliver competitive performance exceeding full fine-tuning on two metrics while reducing model size by approximately 81% and training time by over 80% on a LoRA model with 15.5% trainable parameters. These findings demonstrate that LoRA offers a practical balance of interpretability and resource efficiency, enabling deployment in resource-constrained environments without sacrificing explanation quality. By providing feature-driven natural language explanations for malware classifications, this approach enhances transparency, analyst confidence, and operational scalability in malware detection systems.

Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning

TL;DR

Abstract

Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)