Table of Contents
Fetching ...

Towards a Categorical Foundation of Deep Learning: A Survey

Francesco Riccardo Crescenzi

TL;DR

This thesis is a survey that covers some recent work attempting to study machine learning categorically, mainly focusing on the application of category theory to deep learning and the use of string diagrams to provide detailed representations of neural network architectures.

Abstract

The unprecedented pace of machine learning research has lead to incredible advances, but also poses hard challenges. At present, the field lacks strong theoretical underpinnings, and many important achievements stem from ad hoc design choices which are hard to justify in principle and whose effectiveness often goes unexplained. Research debt is increasing and many papers are found not to be reproducible. This thesis is a survey that covers some recent work attempting to study machine learning categorically. Category theory is a branch of abstract mathematics that has found successful applications in many fields, both inside and outside mathematics. Acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning. This could solve some of the aforementioned problems. In this work, we mainly focus on the application of category theory to deep learning. Namely, we discuss the use of categorical optics to model gradient-based learning, the use of categorical algebras and integral transforms to link classical computer science to neural networks, the use of functors to link different layers of abstraction and preserve structure, and, finally, the use of string diagrams to provide detailed representations of neural network architectures.

Towards a Categorical Foundation of Deep Learning: A Survey

TL;DR

This thesis is a survey that covers some recent work attempting to study machine learning categorically, mainly focusing on the application of category theory to deep learning and the use of string diagrams to provide detailed representations of neural network architectures.

Abstract

The unprecedented pace of machine learning research has lead to incredible advances, but also poses hard challenges. At present, the field lacks strong theoretical underpinnings, and many important achievements stem from ad hoc design choices which are hard to justify in principle and whose effectiveness often goes unexplained. Research debt is increasing and many papers are found not to be reproducible. This thesis is a survey that covers some recent work attempting to study machine learning categorically. Category theory is a branch of abstract mathematics that has found successful applications in many fields, both inside and outside mathematics. Acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning. This could solve some of the aforementioned problems. In this work, we mainly focus on the application of category theory to deep learning. Namely, we discuss the use of categorical optics to model gradient-based learning, the use of categorical algebras and integral transforms to link classical computer science to neural networks, the use of functors to link different layers of abstraction and preserve structure, and, finally, the use of string diagrams to provide detailed representations of neural network architectures.
Paper Structure (52 sections, 9 theorems, 37 equations, 32 figures)

This paper contains 52 sections, 9 theorems, 37 equations, 32 figures.

Key Result

Proposition 8

Let $(\mathcal{C}, \bullet)$ be an $\mathcal{M}$-actegory. Then, there exists an identity-on-objects pseudofunctor $\gamma: \mathcal{C} \to \mathbf{Para}_\bullet(\mathcal{C})$ that maps $f \mapsto (I,f)$. If $\mathcal{M}$ is strict, this is a $2$-functor.

Figures (32)

  • Figure 1: String diagrams representing (a) a parametric morphism, (b) a reparametrization of a parametric morphism, (c) a composition of parametric morphisms. (Images taken from gavranovic2024fundamental.)
  • Figure 2: String diagrams representing (a) a lens $\left(ff^*\right)$, (b) the composition of two lenses $\left(ff^*\right)$ and $\left(gg^*\right)$. (Images taken from cruttwell2022categorical.)
  • Figure 3: String diagram representing the inner workings of a weighted optic. (Image taken from gavranovic2024fundamental.)
  • Figure 4: String diagrams representing the inner workings of a parametric lens. (Images taken from cruttwell2022categorical.)
  • Figure 5: String diagrams representing (a) the composition of a model lens and a loss function lens (b) the composition of a model lens, a loss function lens, and a learning rate lens, (d) a supervised learning lens. (Images taken from cruttwell2022categorical.)
  • ...and 27 more figures

Theorems & Definitions (83)

  • Remark 1
  • Definition 2: Actegory
  • Remark 3
  • Definition 4: Monoidal actegory
  • Definition 5: Actegorical strong functor
  • Definition 6: $\mathbf{Para}_{\bullet}(\mathcal{C})$
  • Remark 7
  • Proposition 8
  • Definition 9: Lenses
  • Definition 10: Optics
  • ...and 73 more