Flexible Multi-task Networks by Learning Parameter Allocation
Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent
TL;DR
The paper introduces Gumbel-Matrix, a differentiable framework for flexible, task-aware parameter sharing in multi-task networks. By associating each layer's components with learnable binary allocations per task and optimizing via the Gumbel-Softmax trick, the method adapts sharing to task relatedness, enabling sparse, interpretable task embeddings. Empirical results on MNIST, Omniglot, and synthetic benchmarks show improved accuracy over static sharing baselines and reveal meaningful task clusters, with Omniglot gains up to 17% relative error reduction reported. The approach offers a scalable alternative to hand-crafted sharing patterns and neural architecture search for multi-task learning, with potential for richer metadata-driven extensions.
Abstract
This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of several components across layers, our framework uses learned binary variables to allocate components to tasks in order to encourage more parameter sharing between related tasks, and discourage parameter sharing otherwise. The binary allocation variables are learned jointly with the model parameters by standard back-propagation thanks to the Gumbel-Softmax reparametrization method. When applied to the Omniglot benchmark, the proposed method achieves a 17% relative reduction of the error rate compared to state-of-the-art.
