Table of Contents
Fetching ...

Distributed Zero-Shot Learning for Visual Recognition

Zhi Chen, Yadan Luo, Zi Huang, Jingjing Li, Sen Wang, Xin Yu

TL;DR

This work introduces DistZSL, a distributed zero-shot learning framework that learns from decentralized data with partial class-conditional distributions without sharing raw data. It couples a cross-device attribute regularizer, derived from Graphical Lasso semantic similarities and KL alignment, with a global attribute-to-visual consensus that enforces a bilateral semantic-visual reconstruction to stabilize cross-device mappings. By replacing per-client classifiers with shared semantic anchors and jointly optimizing semantic and visual pathways, DistZSL mitigates local optima and biases due to data heterogeneity. Theoretical analysis provides alignment and reconstruction guarantees, while experiments on five ZSL datasets demonstrate consistent improvements over federated baselines across i.i.d., non-i.i.d., and p.c.c.d. settings, with comprehensive ablations validating each component's contribution.

Abstract

In this paper, we propose a Distributed Zero-Shot Learning (DistZSL) framework that can fully exploit decentralized data to learn an effective model for unseen classes. Considering the data heterogeneity issues across distributed nodes, we introduce two key components to ensure the effective learning of DistZSL: a cross-node attribute regularizer and a global attribute-to-visual consensus. Our proposed cross-node attribute regularizer enforces the distances between attribute features to be similar across different nodes. In this manner, the overall attribute feature space would be stable during learning, and thus facilitate the establishment of visual-to-attribute(V2A) relationships. Then, we introduce the global attribute-tovisual consensus to mitigate biased V2A mappings learned from individual nodes. Specifically, we enforce the bilateral mapping between the attribute and visual feature distributions to be consistent across different nodes. Thus, the learned consistent V2A mapping can significantly enhance zero-shot learning across different nodes. Extensive experiments demonstrate that DistZSL achieves superior performance to the state-of-the-art in learning from distributed data.

Distributed Zero-Shot Learning for Visual Recognition

TL;DR

This work introduces DistZSL, a distributed zero-shot learning framework that learns from decentralized data with partial class-conditional distributions without sharing raw data. It couples a cross-device attribute regularizer, derived from Graphical Lasso semantic similarities and KL alignment, with a global attribute-to-visual consensus that enforces a bilateral semantic-visual reconstruction to stabilize cross-device mappings. By replacing per-client classifiers with shared semantic anchors and jointly optimizing semantic and visual pathways, DistZSL mitigates local optima and biases due to data heterogeneity. Theoretical analysis provides alignment and reconstruction guarantees, while experiments on five ZSL datasets demonstrate consistent improvements over federated baselines across i.i.d., non-i.i.d., and p.c.c.d. settings, with comprehensive ablations validating each component's contribution.

Abstract

In this paper, we propose a Distributed Zero-Shot Learning (DistZSL) framework that can fully exploit decentralized data to learn an effective model for unseen classes. Considering the data heterogeneity issues across distributed nodes, we introduce two key components to ensure the effective learning of DistZSL: a cross-node attribute regularizer and a global attribute-to-visual consensus. Our proposed cross-node attribute regularizer enforces the distances between attribute features to be similar across different nodes. In this manner, the overall attribute feature space would be stable during learning, and thus facilitate the establishment of visual-to-attribute(V2A) relationships. Then, we introduce the global attribute-tovisual consensus to mitigate biased V2A mappings learned from individual nodes. Specifically, we enforce the bilateral mapping between the attribute and visual feature distributions to be consistent across different nodes. Thus, the learned consistent V2A mapping can significantly enhance zero-shot learning across different nodes. Extensive experiments demonstrate that DistZSL achieves superior performance to the state-of-the-art in learning from distributed data.

Paper Structure

This paper contains 45 sections, 10 theorems, 61 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

If $\mathbb{E}_{({\bm{x}},y)}[\ell_{\mathrm{kl}}^{(k)}({\bm{x}},y)]\le \varepsilon_k$ for some $\varepsilon_k > 0$ for client $k$, then for almost all $({\bm{x}},y)$ where $\varepsilon_k$ denotes the expected cross-node alignment error of client $k$, i.e., $\varepsilon_k = \mathbb{E}_{({\bm{x}},y)}[\ell^{(k)}_{\mathrm{kl}}({\bm{x}},y)]$. Consequently, for any two clients $j,k$,

Figures (11)

  • Figure 1: An illustration of Distributed Zero-Shot Learning (DistZSL), which aims to infuse ZSL capability into distributed learning frameworks.
  • Figure 2: Attribute-based learning allows local models to learn towards the global minima across devices. In contrast, attribute-free learning simply averages the classifier weights of individual clients, leading to local optima.
  • Figure 3: Data distributions of i.i.d., non-i.i.d. and p.c.c.d. settings. The darker color represents more training samples.
  • Figure 4: An overview of the proposed DistZSL, a decentralized framework for zero-shot learning models from multiple data sources with no exchange of local training data. On a local device, given an image sample ${\bm{x}}_i$ from a class that is exclusive from all other devices, the image encoder $f(\cdot)$ produces the visual features ${\bm{v}}_i$, which are further fed into the attribute regressor $g(\cdot)$ to predict the attributes $\widehat{a}_i$. In local training, we conduct attribute-based learning by (1) a visual-semantic alignment using semantic cross-entropy loss $\ell_{sce}$ to facilitate attribute prediction and an attribute decorrelation loss $\ell_{ad}$ to suppress the inter-class attribute occurrence, (2) a cross-device attribute regularizer $\ell_{kl}$ to stabilize attribute learning and avoid local models to be biased to locally available classes, and (3) a bilateral visual-semantic connection $\ell_{bc}$ to improve cross-device information consistency on the two modalities.
  • Figure 5: Averaged similarities between the predicted attributes and the ground-truth attributes on CUB test samples. (a)-(d) illustrate similarities after the first, fifth, tenth, and twentieth communication round; (e) shows the pre-computed similarity matrix described in Section \ref{['sec:att']}.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Lemma 1: Client-level alignment
  • Theorem 2: Server-level guarantee under FedAvg
  • Lemma 3: Information preservation via approximate left-inverse
  • Lemma 4: Attribute error bound from reconstruction
  • Theorem 5: Margin preservation for attribute-based classification
  • Lemma 6: Client-level alignment
  • proof
  • Theorem 7: Server-level guarantee under FedAvg
  • proof
  • Lemma 8: Information preservation via approximate left-inverse
  • ...and 5 more