Table of Contents
Fetching ...

LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection

Naveen Gill, Ajvad Haneef K, Madhu Kumar S D

TL;DR

The paper addresses feature selection for malware detection in high-dimensional tabular data by leveraging zero-shot guidance from large language models (LLMs). It introduces LLM-FS, a prompt-based framework that constructs per-feature descriptors from global and class-conditional statistics and queries an LLM with $P_j = \mathcal{C} \,\parallel \,\mathcal{D}(f_j)$ to obtain $s_j \in [0,1]$, enabling top-$k$ feature selection for classifiers such as $\text{RF}$, $\text{ET}$, $\text{MLP}$, and $\text{KNN}$. The approach is comprehensively evaluated on the EMBOD malware dataset against traditional FS baselines, showing competitive accuracy, precision, recall, F1, AUC, and MCC, while also delivering improved interpretability and stability and reducing dependence on labeled data. The results suggest that LLM-FS can effectively bridge statistical feature selection and semantic reasoning in security-critical, high-dimensional settings, with practical implications for scalable and transparent malware detection and a path toward hybrid, efficiency-boosting enhancements.

Abstract

Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Trees, Variance Threshold, Tree-based models, Chi-Squared tests, ANOVA, Random Selection, and Sequential Attention rely primarily on statistical heuristics or model-driven importance scores, often overlooking the semantic context of features. Motivated by recent progress in LLM-driven FS, we investigate whether large language models (LLMs) can guide feature selection in a zero-shot setting, using only feature names and task descriptions, as a viable alternative to traditional approaches. We evaluate multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5 etc.) on the EMBOD dataset (a fusion of EMBER and BODMAS benchmark datasets), comparing them against established FS methods across several classifiers, including Random Forest, Extra Trees, MLP, and KNN. Performance is assessed using accuracy, precision, recall, F1, AUC, MCC, and runtime. Our results demonstrate that LLM-guided zero-shot feature selection achieves competitive performance with traditional FS methods while offering additional advantages in interpretability, stability, and reduced dependence on labeled data. These findings position zero-shot LLM-based FS as a promising alternative strategy for effective and interpretable malware detection, paving the way for knowledge-guided feature selection in security-critical applications

LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection

TL;DR

The paper addresses feature selection for malware detection in high-dimensional tabular data by leveraging zero-shot guidance from large language models (LLMs). It introduces LLM-FS, a prompt-based framework that constructs per-feature descriptors from global and class-conditional statistics and queries an LLM with to obtain , enabling top- feature selection for classifiers such as , , , and . The approach is comprehensively evaluated on the EMBOD malware dataset against traditional FS baselines, showing competitive accuracy, precision, recall, F1, AUC, and MCC, while also delivering improved interpretability and stability and reducing dependence on labeled data. The results suggest that LLM-FS can effectively bridge statistical feature selection and semantic reasoning in security-critical, high-dimensional settings, with practical implications for scalable and transparent malware detection and a path toward hybrid, efficiency-boosting enhancements.

Abstract

Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Trees, Variance Threshold, Tree-based models, Chi-Squared tests, ANOVA, Random Selection, and Sequential Attention rely primarily on statistical heuristics or model-driven importance scores, often overlooking the semantic context of features. Motivated by recent progress in LLM-driven FS, we investigate whether large language models (LLMs) can guide feature selection in a zero-shot setting, using only feature names and task descriptions, as a viable alternative to traditional approaches. We evaluate multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5 etc.) on the EMBOD dataset (a fusion of EMBER and BODMAS benchmark datasets), comparing them against established FS methods across several classifiers, including Random Forest, Extra Trees, MLP, and KNN. Performance is assessed using accuracy, precision, recall, F1, AUC, MCC, and runtime. Our results demonstrate that LLM-guided zero-shot feature selection achieves competitive performance with traditional FS methods while offering additional advantages in interpretability, stability, and reduced dependence on labeled data. These findings position zero-shot LLM-based FS as a promising alternative strategy for effective and interpretable malware detection, paving the way for knowledge-guided feature selection in security-critical applications
Paper Structure (20 sections, 12 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 12 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architecture of proposed LLM-FS framework
  • Figure 2: Distribution of samples in the EMBOD dataset
  • Figure 3: Heatmap comparing LLM-based and traditional FS Methods across multiple classifiers: (a) Random Forest, (b) Extra Trees (c) KNN , and (d) MLP. Each cell represents the accuracy achieved by a specific combination of feature selection method and classifier, with color intensity indicating performance level.