Task-Agnostic Federation over Decentralized Data: Research Landscape and Visions
Wentai Wu, Ligang He, Saiqin Long, Ahmed M. Abdelmoniem, Yingliang Wu, Rui Mao, Keqin Li
TL;DR
This paper defines Task-Agnostic Federation (TAF), a data-centric alternative to Federated Learning that enables autonomous parties to exchange generic knowledge over private datasets $D_k$ without a shared learning goal. It formalizes a hub-and-spoke or P2P system where each participant $u_k$ seeks to maximize its data-related gain $g_k(D_k, \mathcal{U}, \pi_k)$ via a private protocol $\pi_k$, independent of others' objectives. A three-way roadmap—collaborative data expansion, collaborative data refinement, and collective data harmonization—maps a wide range of techniques (sample extrapolation, data augmentation, dataset- and sample-level filtering, re-balancing, data correction, and feature engineering) to the Task-Agnostic Federation paradigm, with representative methods and workflows. The authors discuss open challenges such as decentralized knowledge seeking, lightweight knowledge exchange, confidential retrieval of exogenous knowledge, incomplete-data collaboration, potential autonomy advantages, and knowledge erasure, arguing that TAF can enable sustainable, cost-efficient data-centric collaboration beyond learning while accommodating diverse participant motivations.
Abstract
Increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although Federated Learning-based paradigms can enable privacy-preserving collaboration over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we investigate the possibilities to shift from resource-intensive learning to task-agnostic collaboration especially when the participants have no interest in a common goal. We term this new scenario as Task-Agnostic Federation (TAF), and investigate several branches of research that serve as the technical building blocks. These techniques directly or indirectly embrace data-centric approaches that can operate independently of any learning task. In this article, we first describe the system architecture and problem setting for TAF. Then, we present a three-way roadmap and categorize recent studies in three directions: collaborative data expansion, collaborative data refinement, and collective data harmonization in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights about how autonomic parties with varied motivation can cooperate over decentralized data beyond learning.
