Exploring the Possibility of TypiClust for Low-Budget Federated Active Learning
Yuta Ono, Hiroshi Nakamura, Hideki Takase
TL;DR
The paper investigates TypiClust as a low-budget Active Learning strategy within Federated Active Learning, addressing the challenge of obtaining labels under data-privacy constraints. It combines self-supervised representation learning, typicality-based sampling, and clustering to select informative samples, evaluated on CINIC-10 and ISIC2019 with severe annotation budgets. Results show TypiClust generally outperforms baselines in low-budget FAL, with robustness to data heterogeneity and typicality distribution shifts, though a cold-start effect makes random sampling competitive in the tiniest budgets; pre-trained feature encoders can substitute for self-supervised features when unlabeled data are scarce. The work highlights TypiClust's practical potential for real-world low-budget FAL and suggests future directions in aligning cross-client embeddings and leveraging public pre-trained models to broaden participation under limited data.
Abstract
Federated Active Learning (FAL) seeks to reduce the burden of annotation under the realistic constraints of federated learning by leveraging Active Learning (AL). As FAL settings make it more expensive to obtain ground truth labels, FAL strategies that work well in low-budget regimes, where the amount of annotation is very limited, are needed. In this work, we investigate the effectiveness of TypiClust, a successful low-budget AL strategy, in low-budget FAL settings. Our empirical results show that TypiClust works well even in low-budget FAL settings contrasted with relatively low performances of other methods, although these settings present additional challenges, such as data heterogeneity, compared to AL. In addition, we show that FAL settings cause distribution shifts in terms of typicality, but TypiClust is not very vulnerable to the shifts. We also analyze the sensitivity of TypiClust to feature extraction methods, and it suggests a way to perform FAL even in limited data situations.
