Table of Contents
Fetching ...

DDCL: Deep Dual Competitive Learning: A Differentiable End-to-End Framework for Unsupervised Prototype-Based Representation Learning

Giansalvo Cirrincione

Abstract

A persistent structural weakness in deep clustering is the disconnect between feature learning and cluster assignment. Most architectures invoke an external clustering step, typically k-means, to produce pseudo-labels that guide training, preventing the backbone from directly optimising for cluster quality. This paper introduces Deep Dual Competitive Learning (DDCL), the first fully differentiable end-to-end framework for unsupervised prototype-based representation learning. The core contribution is architectural: the external k-means is replaced by an internal Dual Competitive Layer (DCL) that generates prototypes as native differentiable outputs of the network. This single inversion makes the complete pipeline, from backbone feature extraction through prototype generation to soft cluster assignment, trainable by backpropagation through a single unified loss, with no Lloyd iterations, no pseudo-label discretisation, and no external clustering step. To ground the framework theoretically, the paper derives an exact algebraic decomposition of the soft quantisation loss into a simplex-constrained reconstruction error and a non-negative weighted prototype variance term. This identity reveals a self-regulating mechanism built into the loss geometry: the gradient of the variance term acts as an implicit separation force that resists prototype collapse without any auxiliary objective, and leads to a global Lyapunov stability theorem for the reduced frozen-encoder system. Six blocks of controlled experiments validate each structural prediction. The decomposition identity holds with zero violations across more than one hundred thousand training epochs; the negative feedback cycle is confirmed with Pearson -0.98; with a jointly trained backbone, DDCL outperforms its non-differentiable ablation by 65% in clustering accuracy and DeepCluster end-to-end by 122%.

DDCL: Deep Dual Competitive Learning: A Differentiable End-to-End Framework for Unsupervised Prototype-Based Representation Learning

Abstract

A persistent structural weakness in deep clustering is the disconnect between feature learning and cluster assignment. Most architectures invoke an external clustering step, typically k-means, to produce pseudo-labels that guide training, preventing the backbone from directly optimising for cluster quality. This paper introduces Deep Dual Competitive Learning (DDCL), the first fully differentiable end-to-end framework for unsupervised prototype-based representation learning. The core contribution is architectural: the external k-means is replaced by an internal Dual Competitive Layer (DCL) that generates prototypes as native differentiable outputs of the network. This single inversion makes the complete pipeline, from backbone feature extraction through prototype generation to soft cluster assignment, trainable by backpropagation through a single unified loss, with no Lloyd iterations, no pseudo-label discretisation, and no external clustering step. To ground the framework theoretically, the paper derives an exact algebraic decomposition of the soft quantisation loss into a simplex-constrained reconstruction error and a non-negative weighted prototype variance term. This identity reveals a self-regulating mechanism built into the loss geometry: the gradient of the variance term acts as an implicit separation force that resists prototype collapse without any auxiliary objective, and leads to a global Lyapunov stability theorem for the reduced frozen-encoder system. Six blocks of controlled experiments validate each structural prediction. The decomposition identity holds with zero violations across more than one hundred thousand training epochs; the negative feedback cycle is confirmed with Pearson -0.98; with a jointly trained backbone, DDCL outperforms its non-differentiable ablation by 65% in clustering accuracy and DeepCluster end-to-end by 122%.

Paper Structure

This paper contains 70 sections, 17 theorems, 44 equations, 7 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

For any $z_n \in \mathbb{R}^d$, $P \in \mathbb{R}^{d \times k}$, $q_n \in \Delta^{k-1}$: where the variance term is the weighted variance of the prototypes under $q_n$. Equality holds iff $q_n$ is a vertex of $\Delta^{k-1}$ or all active prototypes coincide. $\blacktriangleleft$$\blacktriangleleft$

Figures (7)

  • Figure 1: Batch DDCL pipeline with CNN backbone (see Section \ref{['sec:ddcl']}). Blue: forward pass; red: backpropagation.
  • Figure 2: Block 1 --- Prototype collapse on Moons ($n=300$, $k=2$). Left: prototype separation $\mathcal{S}(P)$ at convergence vs. temperature $T$: $\mathcal{L}_q$ (solid) remains stable; $L_{\mathrm{OLS}}$ (dashed) diverges at high $T$. Right: collapse rate (% of runs): zero for $\mathcal{L}_q$, rising for $L_{\mathrm{OLS}}$.
  • Figure 3: Block 2 --- DDCL dynamics on MNIST Digits (run 1). Top: loss decomposition, feedback scatter ($r=-0.98$), ACC curves, phase portrait $(\mathcal{S}(t),\mathcal{K}(t))$. Bottom: 10 learned prototypes with Hungarian-matched digit labels.
  • Figure 4: Block 3 --- ACC (left) and NMI (right) vs. $d$ on log axis ($n=100$, $k=2$, 5 runs). Dotted line: $d=n$.
  • Figure 5: Block 4 --- DDCL dynamics on CIFAR-10 (ResNet-18 frozen, run 1). Top: loss decomposition and feedback scatter $(\mathcal{S},\mathcal{K})$. Bottom: accuracy curves and phase portrait.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Remark 1: Role of the quadratic term $(\lambda/2)\|P\|_F^2$
  • Remark 2: Beyond OLS: noise assumptions and regression variants
  • Theorem 1: Loss decomposition
  • Corollary 1: Upper bound
  • Proposition 1: Entropic regularization
  • Corollary 2: Monotonicity of $V$
  • Proposition 2: Gradients w.r.t. $q_n$
  • Proposition 3: Gradients w.r.t. $P$
  • Proposition 4: Gradients w.r.t. $z_n$
  • Corollary 3: Stop-gradient as design choice
  • ...and 11 more