Table of Contents
Fetching ...

Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective

Zhuojun Tian, Mehdi Bennis

TL;DR

The paper tackles distributed multi-view perception where each agent observes partial data and centralized data fusion is impractical. It proposes a compositional framework based on the Maximal Coding Rate Reduction ($MCR^2$) to learn discriminative, diverse subspaces locally and then fuse them into a global representation via periodic SVD-based basis fusion and a projection loss that enforces subspace alignment. The authors provide theoretical guarantees: the projection-induced change in the $MCR^2$ objective is bounded by the projection residual energy, and the fused subspace converges to the true global discriminative subspace under mild assumptions, with an explicit rate depending on local estimation errors. Empirically, the approach yields competitive accuracy on CIFAR-10 and ModelNet-10 while preserving cross-view diversity and intra-class structure, outperforming baselines that produce correlated or collapsed representations.

Abstract

In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.

Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective

TL;DR

The paper tackles distributed multi-view perception where each agent observes partial data and centralized data fusion is impractical. It proposes a compositional framework based on the Maximal Coding Rate Reduction () to learn discriminative, diverse subspaces locally and then fuse them into a global representation via periodic SVD-based basis fusion and a projection loss that enforces subspace alignment. The authors provide theoretical guarantees: the projection-induced change in the objective is bounded by the projection residual energy, and the fused subspace converges to the true global discriminative subspace under mild assumptions, with an explicit rate depending on local estimation errors. Empirically, the approach yields competitive accuracy on CIFAR-10 and ModelNet-10 while preserving cross-view diversity and intra-class structure, outperforming baselines that produce correlated or collapsed representations.

Abstract

In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.

Paper Structure

This paper contains 13 sections, 4 theorems, 23 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

$\bm{P}_{k}\in\mathbb{R}^{d\times d}$ is the orthogonal projection operator satisfying $\bm{P}_{k}^2=\bm{P}_{k}$ and $\bm{P}_{k}^T=\bm{P}_{k}$ and projects the vectors to the subspace formulated by the basis $\bm{U}_{fuse,k}$.

Figures (3)

  • Figure 1: Illustration of the proposed multi-view perception framework.
  • Figure 2: Cosine similarity of the learned representations for CIFAR-10.
  • Figure 3: Cosine similarity of the learned representations for ModelNet-10.

Theorems & Definitions (7)

  • Lemma 1
  • Theorem 1: Linear trace bound on coding-rate change
  • Theorem 2: Consistency of SVD Fusion
  • Lemma 2
  • proof
  • proof
  • proof