Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism
Xiangming Xi, Feng Gao, Jun Xu, Fangtai Guo, Tianlei Jin
TL;DR
The paper tackles the limitation of traditional MTL that relies on feature- or parameter-level relatedness by introducing output-level relatedness through a feedback mechanism that uses one task's outputs as posterior information for others. It models a dynamic, iterative MTL process augmented with amplifier blocks and a Gumbel-based gate to determine infusion points, along with a convergence loss to stabilize predictions across iterations. Empirical results on SLU benchmarks show improved performance across tasks, with ablations demonstrating the importance of both the convergence loss and the gating mechanism. The approach provides a practical pathway to harness cross-task output correlations, potentially enhancing MTL systems in domains requiring tightly coupled outputs.
Abstract
Multi-task learning (MTL) is a paradigm that simultaneously learns multiple tasks by sharing information at different levels, enhancing the performance of each individual task. While previous research has primarily focused on feature-level or parameter-level task relatedness, and proposed various model architectures and learning algorithms to improve learning performance, we aim to explore output-level task relatedness. This approach introduces a posteriori information into the model, considering that different tasks may produce correlated outputs with mutual influences. We achieve this by incorporating a feedback mechanism into MTL models, where the output of one task serves as a hidden feature for another task, thereby transforming a static MTL model into a dynamic one. To ensure the training process converges, we introduce a convergence loss that measures the trend of a task's outputs during each iteration. Additionally, we propose a Gumbel gating mechanism to determine the optimal projection of feedback signals. We validate the effectiveness of our method and evaluate its performance through experiments conducted on several baseline models in spoken language understanding.
