Table of Contents
Fetching ...

RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoT

Alberto Huertas Celdrán, Pedro Miguel Sánchez Sánchez, Jan von der Assen, Timo Schenk, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

TL;DR

The paper tackles protecting resource-constrained SBCs from heterogeneous zero-day attacks by selecting appropriate MTD techniques through online reinforcement learning. It introduces a framework that uses a 46-feature device behavioral fingerprint (derived from perf events) and Deep Q-Learning, with an unsupervised Autoencoder-based detector providing immediate rewards to guide MTD selection among four techniques. The approach is validated on a Raspberry Pi 3 in a real IoT crowd-sensing setup, demonstrating learning convergence and effective mitigation for most attacks while keeping resource usage minimal ($<$1 MB storage, $<$55\% CPU, $<$80\% RAM); one passive rootkit attack remains challenging. This work shows the practical feasibility of online RL for MTD in resource-constrained IoT, paving the way for broader defenses against zero-day threats on SBCs.

Abstract

Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.

RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoT

TL;DR

The paper tackles protecting resource-constrained SBCs from heterogeneous zero-day attacks by selecting appropriate MTD techniques through online reinforcement learning. It introduces a framework that uses a 46-feature device behavioral fingerprint (derived from perf events) and Deep Q-Learning, with an unsupervised Autoencoder-based detector providing immediate rewards to guide MTD selection among four techniques. The approach is validated on a Raspberry Pi 3 in a real IoT crowd-sensing setup, demonstrating learning convergence and effective mitigation for most attacks while keeping resource usage minimal (1 MB storage, 55\% CPU, 80\% RAM); one passive rootkit attack remains challenging. This work shows the practical feasibility of online RL for MTD in resource-constrained IoT, paving the way for broader defenses against zero-day threats on SBCs.

Abstract

Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
Paper Structure (11 sections, 5 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: RL-based Framework Overview
  • Figure 2: Behavioral Families and Events Selected to Represent SBC States
  • Figure 3: Online Agent Learning Process Life Cycle
  • Figure 4: Learning Over Episodes and Epsilon Decay
  • Figure 5: RAM and CPU Used by the Framework for Ransomware_PoC and Ransomware Trap MTD