Global Outlier Detection in a Federated Learning Setting with Isolation Forest
Daniele Malpetti, Laura Azzimonti
TL;DR
The paper tackles privacy-preserving global outlier detection in cross-silo federated learning by deploying a two-server architecture that operates on masked data. Clients jointly generate a masking transformation $M=Q S Q'$ and additive noise $R^i$, then share masked representations so a central detector can run IF or EIF on $X_{masked}$ without exposing data ownership, achieving results comparable to centralized IF on plain data. Key contributions include a secure protocol for seed agreement via Paillier, a structured data masking and transfer scheme, and a thorough analysis of privacy implications, including collusion scenarios and potential privacy enhancements. The approach demonstrates practical viability for preprocessing in FL pipelines and opens pathways to apply similar masking strategies to other anomaly detection tasks while preserving data confidentiality.
Abstract
We present a novel strategy for detecting global outliers in a federated learning setting, targeting in particular cross-silo scenarios. Our approach involves the use of two servers and the transmission of masked local data from clients to one of the servers. The masking of the data prevents the disclosure of sensitive information while still permitting the identification of outliers. Moreover, to further safeguard privacy, a permutation mechanism is implemented so that the server does not know which client owns any masked data point. The server performs outlier detection on the masked data, using either Isolation Forest or its extended version, and then communicates outlier information back to the clients, allowing them to identify and remove outliers in their local datasets before starting any subsequent federated model training. This approach provides comparable results to a centralized execution of Isolation Forest algorithms on plain data.
