Table of Contents
Fetching ...

Learning Differentiable Surrogate Losses for Structured Prediction

Junjie Yang, Matthieu Labeau, Florence d'Alché-Buc

TL;DR

This work introduces a novel framework in which a structured loss function, parameterized by neural networks, is learned directly from output training data through Contrastive Learning, prior to addressing the supervised surrogate regression problem.

Abstract

Structured prediction involves learning to predict complex structures rather than simple scalar values. The main challenge arises from the non-Euclidean nature of the output space, which generally requires relaxing the problem formulation. Surrogate methods build on kernel-induced losses or more generally, loss functions admitting an Implicit Loss Embedding, and convert the original problem into a regression task followed by a decoding step. However, designing effective losses for objects with complex structures presents significant challenges and often requires domain-specific expertise. In this work, we introduce a novel framework in which a structured loss function, parameterized by neural networks, is learned directly from output training data through Contrastive Learning, prior to addressing the supervised surrogate regression problem. As a result, the differentiable loss not only enables the learning of neural networks due to the finite dimension of the surrogate space but also allows for the prediction of new structures of the output data via a decoding strategy based on gradient descent. Numerical experiments on supervised graph prediction problems show that our approach achieves similar or even better performance than methods based on a pre-defined kernel.

Learning Differentiable Surrogate Losses for Structured Prediction

TL;DR

This work introduces a novel framework in which a structured loss function, parameterized by neural networks, is learned directly from output training data through Contrastive Learning, prior to addressing the supervised surrogate regression problem.

Abstract

Structured prediction involves learning to predict complex structures rather than simple scalar values. The main challenge arises from the non-Euclidean nature of the output space, which generally requires relaxing the problem formulation. Surrogate methods build on kernel-induced losses or more generally, loss functions admitting an Implicit Loss Embedding, and convert the original problem into a regression task followed by a decoding step. However, designing effective losses for objects with complex structures presents significant challenges and often requires domain-specific expertise. In this work, we introduce a novel framework in which a structured loss function, parameterized by neural networks, is learned directly from output training data through Contrastive Learning, prior to addressing the supervised surrogate regression problem. As a result, the differentiable loss not only enables the learning of neural networks due to the finite dimension of the surrogate space but also allows for the prediction of new structures of the output data via a decoding strategy based on gradient descent. Numerical experiments on supervised graph prediction problems show that our approach achieves similar or even better performance than methods based on a pre-defined kernel.

Paper Structure

This paper contains 25 sections, 1 theorem, 20 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Proposition 3.1

Suppose that $(\hat{\mathrm{F}}, \hat{\mathrm{E}})$ are the projections of $(\tilde{\mathrm{F}}, \tilde{\mathrm{E}}) \in \mathbb{R}^{m_{\mathrm{max}} \times (T+1)} \times \mathbb{R}^{m_{\mathrm{max}} \times m_{\mathrm{max}} \times S}$ on $\overline{\mathcal{G}}$; then, $(\hat{\mathrm{F}}, \hat{\mat

Figures (3)

  • Figure 1: Illustration of ELE Framework.
  • Figure 2: (Left) The GED with edge features of DIDOR under different decoding strategies with various sizes of candidate set. (Right) The number of predictions whose GED with the ground truth is zero, obtained by DIDOR under different decoding strategies with various sizes of candidate set.
  • Figure 3: Update of the predicted molecules along with the projected gradient descent.

Theorems & Definitions (3)

  • Remark 2.1
  • Proposition 3.1: Projected Gradient Descent on Relaxed Graph Space
  • proof