Table of Contents
Fetching ...

Topological derivative approach for deep neural network architecture adaptation

C G Krishnanunni, Tan Bui-Thanh, Clint Dawson

TL;DR

This work addresses the challenge of deciding when, where, and how to grow neural network depth in a principled way. It introduces a network topological derivative derived from a shape-functional viewpoint and connects it to a Hamiltonian-based optimal-control framework, yielding a closed-form criterion for layer insertion. The approach solves a greedy eigenproblem to identify the most sensitive depth and provides a principled initialization for the new layer, with an interpretation via optimal transport. Across fully connected nets, CNNs, and vision transformers, the method demonstrates consistent improvements over ad-hoc baselines and offers a pathway for transfer learning and parameter-efficient fine-tuning in data-limited settings. The work also shows how the topological derivative framework can be extended to OT perspectives and automated growing, broadening its applicability to diverse architectures and tasks.

Abstract

This work presents a novel algorithm for progressively adapting neural network architecture along the depth. In particular, we attempt to address the following questions in a mathematically principled way: i) Where to add a new capacity (layer) during the training process? ii) How to initialize the new capacity? At the heart of our approach are two key ingredients: i) the introduction of a ``shape functional" to be minimized, which depends on neural network topology, and ii) the introduction of a topological derivative of the shape functional with respect to the neural network topology. Using an optimal control viewpoint, we show that the network topological derivative exists under certain conditions, and its closed-form expression is derived. In particular, we explore, for the first time, the connection between the topological derivative from a topology optimization framework with the Hamiltonian from optimal control theory. Further, we show that the optimality condition for the shape functional leads to an eigenvalue problem for deep neural architecture adaptation. Our approach thus determines the most sensitive location along the depth where a new layer needs to be inserted during the training phase and the associated parametric initialization for the newly added layer. We also demonstrate that our layer insertion strategy can be derived from an optimal transport viewpoint as a solution to maximizing a topological derivative in $p$-Wasserstein space, where $p>= 1$. Numerical investigations with fully connected network, convolutional neural network, and vision transformer on various regression and classification problems demonstrate that our proposed approach can outperform an ad-hoc baseline network and other architecture adaptation strategies. Further, we also demonstrate other applications of topological derivative in fields such as transfer learning.

Topological derivative approach for deep neural network architecture adaptation

TL;DR

This work addresses the challenge of deciding when, where, and how to grow neural network depth in a principled way. It introduces a network topological derivative derived from a shape-functional viewpoint and connects it to a Hamiltonian-based optimal-control framework, yielding a closed-form criterion for layer insertion. The approach solves a greedy eigenproblem to identify the most sensitive depth and provides a principled initialization for the new layer, with an interpretation via optimal transport. Across fully connected nets, CNNs, and vision transformers, the method demonstrates consistent improvements over ad-hoc baselines and offers a pathway for transfer learning and parameter-efficient fine-tuning in data-limited settings. The work also shows how the topological derivative framework can be extended to OT perspectives and automated growing, broadening its applicability to diverse architectures and tasks.

Abstract

This work presents a novel algorithm for progressively adapting neural network architecture along the depth. In particular, we attempt to address the following questions in a mathematically principled way: i) Where to add a new capacity (layer) during the training process? ii) How to initialize the new capacity? At the heart of our approach are two key ingredients: i) the introduction of a ``shape functional" to be minimized, which depends on neural network topology, and ii) the introduction of a topological derivative of the shape functional with respect to the neural network topology. Using an optimal control viewpoint, we show that the network topological derivative exists under certain conditions, and its closed-form expression is derived. In particular, we explore, for the first time, the connection between the topological derivative from a topology optimization framework with the Hamiltonian from optimal control theory. Further, we show that the optimality condition for the shape functional leads to an eigenvalue problem for deep neural architecture adaptation. Our approach thus determines the most sensitive location along the depth where a new layer needs to be inserted during the training phase and the associated parametric initialization for the newly added layer. We also demonstrate that our layer insertion strategy can be derived from an optimal transport viewpoint as a solution to maximizing a topological derivative in -Wasserstein space, where . Numerical investigations with fully connected network, convolutional neural network, and vision transformer on various regression and classification problems demonstrate that our proposed approach can outperform an ad-hoc baseline network and other architecture adaptation strategies. Further, we also demonstrate other applications of topological derivative in fields such as transfer learning.

Paper Structure

This paper contains 42 sections, 7 theorems, 87 equations, 14 figures, 10 tables, 4 algorithms.

Key Result

Proposition 2.3

\newlabelprop_admissible0 Consider admissible_pert. The following two steps produce an admissible perturbation:

Figures (14)

  • Figure 1: Schematic view of the topological derivative approach: A new layer with parameters $\varepsilon \boldsymbol{\phi}$ is inserted between the $1^{st}$ and $2^{nd}$ layer. When $\varepsilon=0$, the network $\Omega_\varepsilon$ behaves exactly the same way as $\Omega_0$ under the standard training process (Residual connections are not shown in the figure).
  • Figure 1: Validating \ref{['exist_th']}. Left subfigure: comparison of the theoretically computed topological derivative (equation \ref{['topo_de']}) with the numerically computed derivative for layer 'l' with the largest eigenvalue and initialization $\Phi_l$ given by \ref{['quant']}; Middle subfigure: effect of initialization $\boldsymbol{\phi}$ on the numerically computed derivative $d{\mathcal{J}}(\Omega_0;\ (l,\ \boldsymbol{\phi},\ \sigma))$ in \ref{['topo_der']} for $l=1$ and at the end of $1^{st}$ iteration; Right subfigure: learned function using the proposed approach at different iterations of the algorithm.
  • Figure 1: Optimal transport interpretation of our proposed approach: We wish to optimally transport (in some sense) the parameters from network $\Omega_0$ (left figure) to the new network $\Omega_\varepsilon$ (right figure).
  • Figure 1: 2D heat conductivity inversion problem (Left to Right): The domain and the boundaries; A $16\times 16$ finite element mesh and $10$ observational locations.
  • Figure 2: Left subfigure: typical training loss curves for different approaches. Right subfigure: summary of results.
  • ...and 9 more figures

Theorems & Definitions (16)

  • Definition 2.1: Network perturbation
  • Definition 2.2: Admissible perturbation
  • Proposition 2.3: Construction of an admissible perturbation
  • Remark 2.4
  • Definition 2.5: Network topological derivative
  • Remark 2.6
  • Theorem 2.7: Existence of network topological derivative
  • Corollary 2.8
  • Theorem 2.9
  • Remark 2.10
  • ...and 6 more