CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation
Chao Feng, Alberto Huertas Celdran, Pedro Miguel Sanchez Sanchez, Jan Kreischer, Jan von der Assen, Gerome Bovet, Gregorio Martinez Perez, Burkhard Stiller
TL;DR
CyberForce addresses the privacy and scalability limitations of centralized RL in malware mitigation for IoT by integrating Federated Learning with Deep Q-Learning to privately learn MTD strategies against zero-day attacks. It combines device behavioral fingerprinting and unsupervised anomaly detection to provide timely rewards for MTD decisions, and employs multiple aggregation algorithms (FedAvg, Krum, Trimmed Mean) to enhance robustness. In real-world-like experiments on ten Raspberry Pi 4 sensors running ElectroSense, CyberForce achieves near-parity with centralized baselines in IID settings while reducing learning time by about two-thirds, and demonstrates transfer benefits across heterogeneous attack conditions. The framework also analyzes robustness to internal poisoning, offering guidance on aggregation choices under varying privacy and threat landscapes, with implications for scalable, privacy-preserving cyber defense in IoT ecosystems.
Abstract
Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks.
