A Metric Topology of Deep Learning for Data Classification

Jwo-Yuh Wu; Liang-Chi Huang; Wen-Hsuan Li; Chun-Hung Liu

A Metric Topology of Deep Learning for Data Classification

Jwo-Yuh Wu, Liang-Chi Huang, Wen-Hsuan Li, Chun-Hung Liu

TL;DR

This work addresses the theoretical foundations of deep learning for data classification by introducing a metric topology over the network-parameter space. It defines a probabilistic distance $d_{\mu}$ that quantifies how often two networks disagree on classifications, partitions networks into equivalence classes $[w]$ of equal performance, and shows that $d$ induces a true metric on the quotient space $\mathcal{W}/\cong$. Under mild data-distribution assumptions, almost all networks avoid non-unique labels, the quotient space is compact (upon $\epsilon$-pruning), and the proposed metric topology aligns with the standard quotient topology, enabling rigorous metric-space analyses of DL. The framework opens avenues for topology-informed DL design, including contraction-based methods and connectivity studies, with potential extensions to regression and bandit settings.

Abstract

Empirically, Deep Learning (DL) has demonstrated unprecedented success in practical applications. However, DL remains by and large a mysterious "black-box", spurring recent theoretical research to build its mathematical foundations. In this paper, we investigate DL for data classification through the prism of metric topology. Considering that conventional Euclidean metric over the network parameter space typically fails to discriminate DL networks according to their classification outcomes, we propose from a probabilistic point of view a meaningful distance measure, whereby DL networks yielding similar classification performances are close. The proposed distance measure defines such an equivalent relation among network parameter vectors that networks performing equally well belong to the same equivalent class. Interestingly, our proposed distance measure can provably serve as a metric on the quotient set modulo the equivalent relation. Then, under quite mild conditions it is shown that, apart from a vanishingly small subset of networks likely to predict non-unique labels, our proposed metric space is compact, and coincides with the well-known quotient topological space. Our study contributes to fundamental understanding of DL, and opens up new ways of studying DL using fruitful metric space theory.

A Metric Topology of Deep Learning for Data Classification

TL;DR

This work addresses the theoretical foundations of deep learning for data classification by introducing a metric topology over the network-parameter space. It defines a probabilistic distance

that quantifies how often two networks disagree on classifications, partitions networks into equivalence classes

of equal performance, and shows that

induces a true metric on the quotient space

. Under mild data-distribution assumptions, almost all networks avoid non-unique labels, the quotient space is compact (upon

-pruning), and the proposed metric topology aligns with the standard quotient topology, enabling rigorous metric-space analyses of DL. The framework opens avenues for topology-informed DL design, including contraction-based methods and connectivity studies, with potential extensions to regression and bandit settings.

Abstract

Paper Structure (19 sections, 5 theorems, 48 equations, 8 figures, 2 tables)

This paper contains 19 sections, 5 theorems, 48 equations, 8 figures, 2 tables.

Introduction
Background and paper contributions
Related works
Notation list
Network model
Proposed distance measure
Formulation
Properties of $d_{\mu}(\cdot,\cdot)$
Metric topology
Metric topology on quotient set of $\mathcal{W}$
Example
Theoretical results
Projection map
Properties of metric topology $(\mathcal{W} \slash \cong,d)$
Summary and discussions
...and 4 more sections

Key Result

Theorem 4.1

The distance measure $d(\cdot,\cdot)$ in eq:4_3 is a metric on the quotient set $\mathcal{W} \slash \cong$.

Figures (8)

Figure 1: Illustration of a DL network with one hidden layer, characterized by six real parameters $w=(w^{(1)},w^{(2)},w^{(3)},w^{(4)},b^{(1)},b^{(2)})$, to conduct binary classification for input data from $\mathbb{R}^2$. The input-output relation of the network is represented by a function $f_w:\mathbb{R}^2\to\mathbb{R}^2$, with $f_w^j$ the $j$th component. The network labels a test data point $x=(x_1,x_2)\in\mathbb{R}^2$ as $i$ if $f_w^i(x)>f_w^j(x)$, and makes a random guess if $f_w^1(x)=f_w^2(x)$.
Figure 2: An illustration of classification outcomes of three networks (parameter vectors listed in Table \ref{['tab:toy']}) for randomly sampled testing points in $\mathbb{R}^2$, where the ground truth is shown in (a). Sub-figures (b), (c), and (d) are the labeled results corresponding to parameter vectors $\widetilde{w}_1$, $\widetilde{w}_2$, and $\widetilde{w}_3$, respectively.
Figure 3: An illustration of the input-output relation of a DL network, with $L$ hidden layers, for data classification.
Figure 4: Schematic depiction of a DL network $f_{w'}$ achieving a generalization error less than $\epsilon$: it lies in the $\epsilon$-neighborhood centered at the ground truth classifier $f_w$.
Figure 5: An illustration of the proposed metric space $(\mathcal{W} \slash \cong,d)$. (a) With the Euclidean metric, DL networks yielding identical classification performance are spread over the space $\mathcal{W}$. (b) They are identified as the same equivalent class in the quotient set $\mathcal{W} \slash \cong$, and the performance gap between distinct classes is assessed by the proposed metric $d$.
...and 3 more figures

Theorems & Definitions (12)

proof
proof
Theorem 4.1
proof
Theorem 5.1
proof
Theorem 5.2
proof
Theorem 5.3
proof
...and 2 more

A Metric Topology of Deep Learning for Data Classification

TL;DR

Abstract

A Metric Topology of Deep Learning for Data Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)