Table of Contents
Fetching ...

Deep Declarative Networks: A New Hope

Stephen Gould, Richard Hartley, Dylan Campbell

TL;DR

Deep Declarative Networks (DDNs) replace explicit forward mappings with optimization-defined behaviors, enabling end-to-end learning by backpropagating through implicitly defined node outputs via the implicit function theorem. The framework subsumes conventional neural networks, supports both declarative and imperative nodes, and leverages classical constrained/unconstrained optimization as modular layers. The authors derive gradient expressions for unconstrained, equality-constrained, and inequality-constrained nodes, discuss feasibility and non-smooth cases, and demonstrate practical implementations in PyTorch with robust pooling and $L_p$ projection examples. Experiments on image and point-cloud classification show robustness and calibration benefits from declarative components, suggesting broader applicability to model-based reasoning and constrained representations. Overall, DDNs offer a principled route to incorporate physical models, geometric constraints, and non-differentiable steps into deep learning while preserving differentiable end-to-end learning.

Abstract

We explore a new class of end-to-end learnable models wherein data processing nodes (or network layers) are defined in terms of desired behavior rather than an explicit forward function. Specifically, the forward function is implicitly defined as the solution to a mathematical optimization problem. Consistent with nomenclature in the programming languages community, we name these models deep declarative networks. Importantly, we show that the class of deep declarative networks subsumes current deep learning models. Moreover, invoking the implicit function theorem, we show how gradients can be back-propagated through many declaratively defined data processing nodes thereby enabling end-to-end learning. We show how these declarative processing nodes can be implemented in the popular PyTorch deep learning software library allowing declarative and imperative nodes to co-exist within the same network. We also provide numerous insights and illustrative examples of declarative nodes and demonstrate their application for image and point cloud classification tasks.

Deep Declarative Networks: A New Hope

TL;DR

Deep Declarative Networks (DDNs) replace explicit forward mappings with optimization-defined behaviors, enabling end-to-end learning by backpropagating through implicitly defined node outputs via the implicit function theorem. The framework subsumes conventional neural networks, supports both declarative and imperative nodes, and leverages classical constrained/unconstrained optimization as modular layers. The authors derive gradient expressions for unconstrained, equality-constrained, and inequality-constrained nodes, discuss feasibility and non-smooth cases, and demonstrate practical implementations in PyTorch with robust pooling and projection examples. Experiments on image and point-cloud classification show robustness and calibration benefits from declarative components, suggesting broader applicability to model-based reasoning and constrained representations. Overall, DDNs offer a principled route to incorporate physical models, geometric constraints, and non-differentiable steps into deep learning while preserving differentiable end-to-end learning.

Abstract

We explore a new class of end-to-end learnable models wherein data processing nodes (or network layers) are defined in terms of desired behavior rather than an explicit forward function. Specifically, the forward function is implicitly defined as the solution to a mathematical optimization problem. Consistent with nomenclature in the programming languages community, we name these models deep declarative networks. Importantly, we show that the class of deep declarative networks subsumes current deep learning models. Moreover, invoking the implicit function theorem, we show how gradients can be back-propagated through many declaratively defined data processing nodes thereby enabling end-to-end learning. We show how these declarative processing nodes can be implemented in the popular PyTorch deep learning software library allowing declarative and imperative nodes to co-exist within the same network. We also provide numerous insights and illustrative examples of declarative nodes and demonstrate their application for image and point cloud classification tasks.

Paper Structure

This paper contains 24 sections, 6 theorems, 69 equations, 4 figures, 4 tables.

Key Result

Proposition 4.4

Consider a function $f: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}$. Let Assume $y(x)$ exists and that $f$ is second-order differentiable in the neighborhood of the point $(x, y(x))$. Set $H = \text{D}_{YY}^2 f(x, y(x)) \in \mathbb{R}^{m \times m}$ and $B = \text{D}_{XY}^2 f(x, y(x)) \in \mathbb{R}^{m \times n}$. Then for $H$ non-singular the derivative of $y$ with respe

Figures (4)

  • Figure 1: Parametrized data processing nodes in an end-to-end learnable model with global objective function $J$. During the forward evaluation pass of an imperative node (top) the input $x$ is transformed into output $y$ based on some explicit parametrized function $\tilde{f}(\cdot; \theta)$. During the forward evaluation pass of a declarative node (bottom) the output $y$ is computed as the minimizer of some parametrized objective function $f(x, \cdot; \theta)$. During the backward parameter update pass for either node type, the gradient of the global objective function with respect to the output $\text{D}_{}J(y)$ is propagated backwards via the chain rule to produce gradients with respect to the input $\text{D}_{}J(x)$ and parameters $\text{D}_{}J(\theta)$.
  • Figure 2: Bi-level optimization problem showing back-propagation of gradients through a deep declarative node. The quantity $(*)$ is $\text{D}_{Y}J(x, y) \text{D}_{}y(x)$ which when added to $\text{D}_{X} J(x, y)$ gives $\text{D}_{}J(x, y(x))$. The bypass connections (topmost and bottommost paths) do not exist when the upper-level objective $J$ only depends on $x$ through $y$. Moreover, if $f$ appears in $J$ as the only term involving $y$ then $\text{D}_{Y} J(x, y)$ is zero and the backward edge $(*)$ is not required. That is, $\text{D}_{}J(x) = \text{D}_{X} J(x, y)$.
  • Figure 3: Illustration of different scenarios for the solution to inequality constrained deep declarative nodes. In the first scenario ($y_1$) the solution is a local minimum strictly satisfying the constraints. In the second scenario ($y_2$) the solution is on the boundary of the constraint set with the negative gradient of the objective pointing outside of the set. In the third scenario ($y_3$) the solution is on the boundary of the constraint set and is also a local minimum.
  • Figure 4: Geometry of the gradient for an equality constrained optimization problem. The unconstrained gradient $\text{D}_{}y_\text{unc}(x)$ is corrected to ensure that the solution remains on the constraint surface after gradient descent with an infinitesimal step size.

Theorems & Definitions (15)

  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Proposition 4.4: Unconstrained
  • proof
  • Proposition 4.5: Equality Constrained
  • proof
  • Proposition 4.6: Inequality Constrained
  • proof
  • Corollary 4.7
  • ...and 5 more