Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

Harshit Kumar; Sudarshan Sharma; Biswadeep Chakraborty; Saibal Mukhopadhyay

Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

Harshit Kumar, Sudarshan Sharma, Biswadeep Chakraborty, Saibal Mukhopadhyay

TL;DR

This paper tackles real-time malware detection on mobile devices by reframing segmented hardware telemetry time-series with a MIL framework to address mislabeling of benign segments within malware-labeled streams. It introduces the Malicious Discriminative Score ($\mathfrak{D}_{W_i}$) to selectively weight segment predictions based on inter-channel interactions captured through template distributions and KL divergence. The approach yields a 5% precision improvement while maintaining recall, and provides interpretable saliency maps that localize malicious segments. The work advances robust real-time HMD by preserving window lengths and explicitly modeling localized malware behavior, offering practical benefits for deployed mobile security systems.

Abstract

This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Malicious Discriminative Score within the MIL framework, RT-HMD effectively identifies localized malware behaviors, thereby improving the predictive accuracy. Empirical analysis, using a hardware telemetry dataset collected from a mobile platform across 723 benign and 1033 malware samples, shows a 5% precision boost while maintaining recall, outperforming baselines affected by mislabeled benign segments.

Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

TL;DR

) to selectively weight segment predictions based on inter-channel interactions captured through template distributions and KL divergence. The approach yields a 5% precision improvement while maintaining recall, and provides interpretable saliency maps that localize malicious segments. The work advances robust real-time HMD by preserving window lengths and explicitly modeling localized malware behavior, offering practical benefits for deployed mobile security systems.

Abstract

Paper Structure (14 sections, 6 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Background
HMD and Related Works
Multiple Instance Learning Formulation and Its Implementation Challenges
Dataset Information
Threat Model
Proposed Framework
Overview
Training: Learning Template Distributions
Inference Methodology
Evaluation results
Analyzing Segment Behavior
Impact on Classification Performance
Conclusion and Future Work

Figures (5)

Figure 1: Proposed Methodology: The decision for i-th window $P_{M,W_i}$ is enhanced by the Malicious Discriminative Score $\mathfrak{D}_{W_i}$. This score adjustment corrects mis-classifications of benign segments within a malware-labeled time-series, thereby improving the precision of malware detection.
Figure 2: Overview of the Malicious Discriminative Score (MDS): (a) [Top] Malware time-series with annotated benign and malicious segments [Bottom] Segmented benign time-series; (b) hyperplane under strong supervision assumption (current) and hyperplane under MIL assumption (proposed); (c) MDS functionality explained.
Figure 3: Illustrates the Training and Inference steps of the Interaction-based Statistical Classifier
Figure 4: (a) Boxplot showing MDS of the template distributions for two modalities of information: interaction (conditional distributions) and channels of time-series treated independently (marginal distribution) for different distribution-channels. Marginals do not capture the high MDS behaviors between benign and malware; (b) Boxplot showing the frequency of occurrence of segments (total segments: 101863, segment-length$\approx$1s) stratified by MDS in the training dataset (y-axis is logarithmic). High MDS interactions are less frequent; (c) Increase in AUC upon using MDS weighted decision.
Figure 5: An MDS-based saliency heatmap highlighting localized malicious behavior within the hardware telemetry (multivariate time-series). Each channel of the time-series corresponds to the channels outlined in Table \ref{['tab:dvfs_sysfs_channels']}.

Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

TL;DR

Abstract

Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)