Federated Automated Feature Engineering
Tom Overman, Diego Klabjan
TL;DR
This work presents three AutoFE algorithms tailored for federated learning: Fed-IIFE for horizontal FL, Vertical-FLAFE, and Hybrid-FLAFE. Fed-IIFE extends the IIFE approach by using federated interaction information and federated model evaluations, achieving performance close to centralized AutoFE even under non-IID data. Vertical-FLAFE and Hybrid-FLAFE leverage homomorphic encryption and differential privacy to safely form feature combinations across clients, trading off some speed for privacy while still delivering meaningful improvements over baselines. Collectively, the methods fill a critical gap in AutoFE for FL, enabling automated feature engineering with privacy guarantees across the main FL settings and illustrating the practical viability and privacy-privacy-utility trade-offs involved.
Abstract
Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and domain expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream test scores of our federated AutoFE algorithms is close in performance to the case where data is held centrally and AutoFE is performed centrally.
