Table of Contents
Fetching ...

Deep Learning with Parametric Lenses

Geoffrey S. H. Cruttwell, Bruno Gavranovic, Neil Ghani, Paul Wilson, Fabio Zanasi

TL;DR

This work introduces parametric lenses as a categorical foundation for gradient-based learning, unifying diverse optimisers, loss maps, and architectures under the para(lens) and CRDC framework. By combining Para(C) (parametric maps), Lens(C) (bidirectional data flow), and CRDC (reverse differentiation), the authors model neural layers, learning rates, and optimisers as composable lenses, applicable to real-valued and discrete domains, including Boolean circuits and GANs. The approach yields a uniform description of supervised and unsupervised learning (e.g., Wasserstein GANs) and even learning of inputs via deep dreaming, all implemented in a Python library that demonstrates practical gradient computation through lens composition. The framework aims to enable modular design, reasoning, and extension of learning systems, with future work targeting richer architectures, higher-order differentiation, and broader non-gradient settings.

Abstract

We propose a categorical semantics for machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, and different architectures, shedding new light on their similarities and differences. Furthermore, our approach to learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realised in the discrete setting of Boolean and polynomial circuits. We demonstrate the practical significance of our framework with an implementation in Python.

Deep Learning with Parametric Lenses

TL;DR

This work introduces parametric lenses as a categorical foundation for gradient-based learning, unifying diverse optimisers, loss maps, and architectures under the para(lens) and CRDC framework. By combining Para(C) (parametric maps), Lens(C) (bidirectional data flow), and CRDC (reverse differentiation), the authors model neural layers, learning rates, and optimisers as composable lenses, applicable to real-valued and discrete domains, including Boolean circuits and GANs. The approach yields a uniform description of supervised and unsupervised learning (e.g., Wasserstein GANs) and even learning of inputs via deep dreaming, all implemented in a Python library that demonstrates practical gradient computation through lens composition. The framework aims to enable modular design, reasoning, and extension of learning systems, with future work targeting richer architectures, higher-order differentiation, and broader non-gradient settings.

Abstract

We propose a categorical semantics for machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, and different architectures, shedding new light on their similarities and differences. Furthermore, our approach to learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realised in the discrete setting of Boolean and polynomial circuits. We demonstrate the practical significance of our framework with an implementation in Python.
Paper Structure (26 sections, 8 theorems, 42 equations, 7 figures)

This paper contains 26 sections, 8 theorems, 42 equations, 7 figures.

Key Result

theorem 1

Lenses with backward passes additive in the second component form a functor

Figures (7)

  • Figure 1: An informal illustration of gradient-based learning. This neural network is trained to distinguish different kinds of animals in the input image. Given an input $X$, the network predicts an output $Y$, which is compared by a 'loss map' with what would be the correct answer ('label'). The loss map returns a real value expressing the error of the prediction; this information, together with the learning rate (a weight controlling how much the model should be changed in response to error) is used by an optimiser, which computes by gradient-descent the update of the parameters of the network, with the aim of improving its accuracy. The neural network, the loss map, the optimiser and the learning rate are all components of a supervised learning system, and can vary independently of one another.
  • Figure 2: The parametric lens that captures the learning process informally sketched in Figure \ref{['fig:informalGD']}. Note each component is a lens itself, whose composition yields the interactions described in Figure \ref{['fig:informalGD']}. Defining this picture formally will be the subject of Sections \ref{['section:components-as-lenses']}-\ref{['section:learning-with-lenses']}.
  • Figure 3: Gradient Ascent
  • Figure 4: Model reparameterised by basic gradient descent (left) and a generic stateful optimiser (right).
  • Figure 5: Gradient Descent
  • ...and 2 more figures

Theorems & Definitions (70)

  • Definition 2.1: Parametric category
  • Example 2.2
  • Definition 2.3
  • Remark 2.4
  • Definition 2.5
  • Definition 2.6
  • Example 2.7
  • Example 2.8
  • Remark 2.9
  • theorem 1
  • ...and 60 more