Table of Contents
Fetching ...

Federated Automated Feature Engineering

Tom Overman, Diego Klabjan

TL;DR

This work presents three AutoFE algorithms tailored for federated learning: Fed-IIFE for horizontal FL, Vertical-FLAFE, and Hybrid-FLAFE. Fed-IIFE extends the IIFE approach by using federated interaction information and federated model evaluations, achieving performance close to centralized AutoFE even under non-IID data. Vertical-FLAFE and Hybrid-FLAFE leverage homomorphic encryption and differential privacy to safely form feature combinations across clients, trading off some speed for privacy while still delivering meaningful improvements over baselines. Collectively, the methods fill a critical gap in AutoFE for FL, enabling automated feature engineering with privacy guarantees across the main FL settings and illustrating the practical viability and privacy-privacy-utility trade-offs involved.

Abstract

Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and domain expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream test scores of our federated AutoFE algorithms is close in performance to the case where data is held centrally and AutoFE is performed centrally.

Federated Automated Feature Engineering

TL;DR

This work presents three AutoFE algorithms tailored for federated learning: Fed-IIFE for horizontal FL, Vertical-FLAFE, and Hybrid-FLAFE. Fed-IIFE extends the IIFE approach by using federated interaction information and federated model evaluations, achieving performance close to centralized AutoFE even under non-IID data. Vertical-FLAFE and Hybrid-FLAFE leverage homomorphic encryption and differential privacy to safely form feature combinations across clients, trading off some speed for privacy while still delivering meaningful improvements over baselines. Collectively, the methods fill a critical gap in AutoFE for FL, enabling automated feature engineering with privacy guarantees across the main FL settings and illustrating the practical viability and privacy-privacy-utility trade-offs involved.

Abstract

Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and domain expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream test scores of our federated AutoFE algorithms is close in performance to the case where data is held centrally and AutoFE is performed centrally.

Paper Structure

This paper contains 24 sections, 8 figures, 10 tables, 6 algorithms.

Figures (8)

  • Figure 1: Workflow of Vertical-FLAFE with 3 clients and each have two starting features. Typically there are more clients and the allowed function transformations are more complex.
  • Figure 2: Example of how data is assumed to be partitioned for Hybrid-FLAFE.
  • Figure 3: Fed-IIFE Results. Omitted outlier dataset OpenML586 which had extremely high improvement over baseline. All four settings have different baselines, because baselines and centralized scores are also computed with FL training.
  • Figure 4: Synthetic data verification of federated interaction information
  • Figure 5: Cosine Distance Simulation for high-dimensional Random Vectors
  • ...and 3 more figures