Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed Systems
Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria
TL;DR
This study addresses the challenge of detecting ransomware in distributed, privacy-constrained environments by applying Federated Learning on the Sherpa.ai platform. By comparing centralized, federated, and local training across RanSAP-derived datasets on four servers, the authors demonstrate that federated learning yields a 9% relative improvement over the best-performing local models and approaches centralized performance while preserving data privacy. The work highlights the practicality of FL for cybersecurity vendors operating at scale, where data cannot leave customer environments due to regulatory constraints. The findings support a privacy-preserving, scalable threat-detection paradigm that can be deployed across millions of endpoints without compromising sensitive telemetry.
Abstract
Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. Training high-performing artificial intelligence (AI) detectors requires diverse datasets, which are often distributed across multiple organizations, making centralization necessary. However, centralized learning is often impractical due to security, privacy regulations, data ownership issues, and legal barriers to cross-organizational sharing. Compounding this challenge, ransomware evolves rapidly, demanding models that are both robust and adaptable. In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, which enables multiple organizations to collaboratively train a ransomware detection model while keeping raw data local and secure. This paradigm is particularly relevant for cybersecurity companies (including both software and hardware vendors) that deploy ransomware detection or firewall systems across millions of endpoints. In such environments, data cannot be transferred outside the customer's device due to strict security, privacy, or regulatory constraints. Although FL applies broadly to malware threats, we validate the approach using the Ransomware Storage Access Patterns (RanSAP) dataset. Our experiments demonstrate that FL improves ransomware detection accuracy by a relative 9% over server-local models and achieves performance comparable to centralized training. These results indicate that FL offers a scalable, high-performing, and privacy-preserving framework for proactive ransomware detection across organizational and regulatory boundaries.
