A Pattern Language for Machine Learning Tasks

Benjamin Rodatz; Ian Fan; Tuomas Laakkonen; Neil John Ortega; Thomas Hoffmann; Vincent Wang-Mascianica

A Pattern Language for Machine Learning Tasks

Benjamin Rodatz, Ian Fan, Tuomas Laakkonen, Neil John Ortega, Thomas Hoffmann, Vincent Wang-Mascianica

TL;DR

The paper introduces a diagrammatic, task-based language for ML in which objectives are encoded as equational constraints among learners. It formalises atomic and compound tasks, defines objective functions via divergences and a differentiable combination, and shows how standard ML paradigms instantiate patterns that can be reasoned about compositionally. It then introduces a novel manipulation task that edits a target attribute while preserving other properties, and proves connections to Bayesian inversion and CycleGAN through refinements, showing how such tasks can yield architecture-agnostic, training-stable models without adversarial training per se. Empirically, it validates manipulation on Spriteworld, MNIST, and CelebA, demonstrating end-to-end, pattern-driven design with interpretable latent-space effects.

Abstract

We formalise the essential data of objective functions as equality constraints on composites of learners. We call these constraints "tasks", and we investigate the idealised view that such tasks determine model behaviours. We develop a flowchart-like graphical mathematics for tasks that allows us to; (1) offer a unified perspective of approaches in machine learning across domains; (2) design and optimise desired behaviours model-agnostically; and (3) import insights from theoretical computer science into practical machine learning. As a proof-of-concept of the potential practical impact of our theoretical framework, we exhibit and implement a novel "manipulator" task that minimally edits input data to have a desired attribute. Our model-agnostic approach achieves this end-to-end, and without the need for custom architectures, adversarial training, random sampling, or interventions on the data, hence enabling capable, small-scale, and training-stable models.

A Pattern Language for Machine Learning Tasks

TL;DR

Abstract

Paper Structure (30 sections, 6 theorems, 15 equations, 8 figures, 1 table)

This paper contains 30 sections, 6 theorems, 15 equations, 8 figures, 1 table.

Introduction
Contributions
Tasks and patterns
Tasks
Patterns are "nice" tasks
Analysing complex tasks
Proof of concept: Tasks from specifications - the stack
The manipulation task
Theoretical analysis
Proof 1 - Bayesian inversion
Proof 2 - CycleGAN
Experimental validation of manipulator
Experimental Results I: Simple attributes of synthetic and real-world data
Experimental Results II: Derived attributes of synthetic data
Experimental results III: Interpretability applications on real-world data
...and 15 more sections

Key Result

Lemma 2.9

For all well-typed $f$, $g$, and for any positive linear combination $\alpha: \textcolor{cbgreen}{\mathbb{R}^{\geq 0}} \times \textcolor{cbgreen}{\mathbb{R}^{\geq 0}} \rightarrow \textcolor{cbgreen}{\mathbb{R}^{\geq 0}}$:

Figures (8)

Figure 1: In this example, we train a stack (alongside an autoencoder) to store the latent vectors of Spriteworld shapes. With an image latent size 16 and stack vector size 64, it is able to retain information to faithfully restore up to 4 shapes.
Figure 2: An input Spriteworld image alongside a spectrum of outputs exhibiting the ability of the put to manipulate a single attribute of the input while preserving its other properties. Additionally, the model is able to generalise by interpolating to attribute values unseen during training, in this case producing orange and cyan shapes, whereas during training, it only sees red, green or blue shapes. (further details in \ref{['appendix:spriteworld']})
Figure 3: Outputs of a put trained against an MNIST classifier. The put preserves several graphological aspects, such as stroke weight, slant, and angularity. This represents qualitative evidence to support our prediction that put as a class-conditioned generative model behaves as a style-preserving edit.
Figure 4: To illustrate the concepts of derived attributes and unequal entropy, consider an attribute on the Spriteworld data called blue-circleness, which broadly measures how similar a shape is to a blue circle. We define blue-circleness (bc) as a function of explicit attributes shape and colour; we assign a continuous colour score $cs \in [0, 1]$ based on the hue, where red $= 0$ and blue $= 1$. To illustrate unequal entropy in this example, the class $0$ has higher entropy than $0.4$ because there are more shapes that have bc-value $0$. So manipulating a shape with bc-value $0$ to $0.4$ must lose information.
Figure 5: Complement manipulators (\ref{['patt:compManipulator']}) can manipulate derived attributes such as blue-circleness, by using the complement as a scratchpad to record a correspondence between data points (further details in \ref{['appendix:spriteworld']}) while preserving attributes such as position and size.
...and 3 more figures

Theorems & Definitions (40)

Example 1.1
Example 1.2: Residuation as an architectural choice
Example 1.3: Perceptual losses as multi-objective learning
Example 1.4: VAE
Definition 2.1: Tasks
Definition 2.2: Objective function
Example 2.6: CycleGAN
Definition 2.8: Refinement and equivalence of tasks
Lemma 2.9
proof
...and 30 more

A Pattern Language for Machine Learning Tasks

TL;DR

Abstract

A Pattern Language for Machine Learning Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (40)