Table of Contents
Fetching ...

Flexible Multi-task Networks by Learning Parameter Allocation

Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent

TL;DR

The paper introduces Gumbel-Matrix, a differentiable framework for flexible, task-aware parameter sharing in multi-task networks. By associating each layer's components with learnable binary allocations per task and optimizing via the Gumbel-Softmax trick, the method adapts sharing to task relatedness, enabling sparse, interpretable task embeddings. Empirical results on MNIST, Omniglot, and synthetic benchmarks show improved accuracy over static sharing baselines and reveal meaningful task clusters, with Omniglot gains up to 17% relative error reduction reported. The approach offers a scalable alternative to hand-crafted sharing patterns and neural architecture search for multi-task learning, with potential for richer metadata-driven extensions.

Abstract

This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of several components across layers, our framework uses learned binary variables to allocate components to tasks in order to encourage more parameter sharing between related tasks, and discourage parameter sharing otherwise. The binary allocation variables are learned jointly with the model parameters by standard back-propagation thanks to the Gumbel-Softmax reparametrization method. When applied to the Omniglot benchmark, the proposed method achieves a 17% relative reduction of the error rate compared to state-of-the-art.

Flexible Multi-task Networks by Learning Parameter Allocation

TL;DR

The paper introduces Gumbel-Matrix, a differentiable framework for flexible, task-aware parameter sharing in multi-task networks. By associating each layer's components with learnable binary allocations per task and optimizing via the Gumbel-Softmax trick, the method adapts sharing to task relatedness, enabling sparse, interpretable task embeddings. Empirical results on MNIST, Omniglot, and synthetic benchmarks show improved accuracy over static sharing baselines and reveal meaningful task clusters, with Omniglot gains up to 17% relative error reduction reported. The approach offers a scalable alternative to hand-crafted sharing patterns and neural architecture search for multi-task learning, with potential for richer metadata-driven extensions.

Abstract

This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of several components across layers, our framework uses learned binary variables to allocate components to tasks in order to encourage more parameter sharing between related tasks, and discourage parameter sharing otherwise. The binary allocation variables are learned jointly with the model parameters by standard back-propagation thanks to the Gumbel-Softmax reparametrization method. When applied to the Omniglot benchmark, the proposed method achieves a 17% relative reduction of the error rate compared to state-of-the-art.

Paper Structure

This paper contains 29 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Comparison of 'shared bottom' and 'no sharing' patterns for unrelated tasks (left) and almost equal tasks (right). The plots show loss over time (averaged over tasks and smoothed over a window of $100$ steps). We ran each experiment $30$ times, and the shaded area shows the $90\%$ confidence interval.
  • Figure 2: An example network with two tasks. Some components are used by both tasks (purple), some by only one of the tasks (red or blue, respectively), and one identity component is completely unused (white). Below each layer we show the corresponding allocation matrix.
  • Figure 3: The Omniglot multi-task network.
  • Figure 4: Components inside a modular layer in the Omniglot multi-task network. We denote GroupNorm by GN, and the layer stride as $s$. Note that for this specific architecture we have $s \in \{1, 2\}$.
  • Figure 5: Binary allocation vectors for the $40$ tasks. Rows correspond to tasks: first $20$ rows form the CIFAR cluster, next $10$ the MNIST cluster, and the last $10$ the Fashion-MNIST cluster. Columns correspond to the $48$ components of the model: $16$ components in each of the $3$ modular layers. A yellow pixel denotes a $1$ (the component is allocated to a given task), while a purple pixel denotes a $0$.
  • ...and 3 more figures