Table of Contents
Fetching ...

Task-Agnostic Federation over Decentralized Data: Research Landscape and Visions

Wentai Wu, Ligang He, Saiqin Long, Ahmed M. Abdelmoniem, Yingliang Wu, Rui Mao, Keqin Li

TL;DR

This paper defines Task-Agnostic Federation (TAF), a data-centric alternative to Federated Learning that enables autonomous parties to exchange generic knowledge over private datasets $D_k$ without a shared learning goal. It formalizes a hub-and-spoke or P2P system where each participant $u_k$ seeks to maximize its data-related gain $g_k(D_k, \mathcal{U}, \pi_k)$ via a private protocol $\pi_k$, independent of others' objectives. A three-way roadmap—collaborative data expansion, collaborative data refinement, and collective data harmonization—maps a wide range of techniques (sample extrapolation, data augmentation, dataset- and sample-level filtering, re-balancing, data correction, and feature engineering) to the Task-Agnostic Federation paradigm, with representative methods and workflows. The authors discuss open challenges such as decentralized knowledge seeking, lightweight knowledge exchange, confidential retrieval of exogenous knowledge, incomplete-data collaboration, potential autonomy advantages, and knowledge erasure, arguing that TAF can enable sustainable, cost-efficient data-centric collaboration beyond learning while accommodating diverse participant motivations.

Abstract

Increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although Federated Learning-based paradigms can enable privacy-preserving collaboration over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we investigate the possibilities to shift from resource-intensive learning to task-agnostic collaboration especially when the participants have no interest in a common goal. We term this new scenario as Task-Agnostic Federation (TAF), and investigate several branches of research that serve as the technical building blocks. These techniques directly or indirectly embrace data-centric approaches that can operate independently of any learning task. In this article, we first describe the system architecture and problem setting for TAF. Then, we present a three-way roadmap and categorize recent studies in three directions: collaborative data expansion, collaborative data refinement, and collective data harmonization in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights about how autonomic parties with varied motivation can cooperate over decentralized data beyond learning.

Task-Agnostic Federation over Decentralized Data: Research Landscape and Visions

TL;DR

This paper defines Task-Agnostic Federation (TAF), a data-centric alternative to Federated Learning that enables autonomous parties to exchange generic knowledge over private datasets without a shared learning goal. It formalizes a hub-and-spoke or P2P system where each participant seeks to maximize its data-related gain via a private protocol , independent of others' objectives. A three-way roadmap—collaborative data expansion, collaborative data refinement, and collective data harmonization—maps a wide range of techniques (sample extrapolation, data augmentation, dataset- and sample-level filtering, re-balancing, data correction, and feature engineering) to the Task-Agnostic Federation paradigm, with representative methods and workflows. The authors discuss open challenges such as decentralized knowledge seeking, lightweight knowledge exchange, confidential retrieval of exogenous knowledge, incomplete-data collaboration, potential autonomy advantages, and knowledge erasure, arguing that TAF can enable sustainable, cost-efficient data-centric collaboration beyond learning while accommodating diverse participant motivations.

Abstract

Increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although Federated Learning-based paradigms can enable privacy-preserving collaboration over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we investigate the possibilities to shift from resource-intensive learning to task-agnostic collaboration especially when the participants have no interest in a common goal. We term this new scenario as Task-Agnostic Federation (TAF), and investigate several branches of research that serve as the technical building blocks. These techniques directly or indirectly embrace data-centric approaches that can operate independently of any learning task. In this article, we first describe the system architecture and problem setting for TAF. Then, we present a three-way roadmap and categorize recent studies in three directions: collaborative data expansion, collaborative data refinement, and collective data harmonization in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights about how autonomic parties with varied motivation can cooperate over decentralized data beyond learning.

Paper Structure

This paper contains 27 sections, 17 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The conceptual overview of Task-Agnostic Federation (TAF), which is a data-centric scenario that liberates the participants from any common goal and employs flexible protocols for generic knowledge exchange.
  • Figure 2: Data is the raw material and the most flexible form of knowledge. Information presented in statistics, patterns and models is derived from data and carries a specific subset of knowledge.
  • Figure 3: (Left) Taxonomy of research related to task-agnostic federation, and (right) the couriers of knowledge employed in the studies.
  • Figure 4: A schematic showing the general framework of federated sample extrapolation. The goal is to obtain the missing part of knowledge in the local domain.
  • Figure 5: (Left) results of our experiments over 100 clients under four disparate conditions of data that involve IID, non-IID and low-quality on-device data. (Right) the FedProf framework for dataset-level filtering. Experiment details are described in wu2023fedprof, with more observations included.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Remark
  • Definition 1: Task-Agnostic Federation (TAF)
  • Remark