Table of Contents
Fetching ...

Adversarial Multi-task Learning for Text Classification

Pengfei Liu, Xipeng Qiu, Xuanjing Huang

TL;DR

This work tackles learning task-invariant shared representations for multi-task text classification by introducing an adversarial shared-private framework with orthogonality constraints. The model uses a shared LSTM plus per-task private encoders, augmented by a task discriminator and gradient reversal to purify the shared space, and an orthogonality loss to reduce redundancy. Evaluated on 16 diverse text classification datasets, the approach achieves state-of-the-art performance, with ASP-MTL outperforming baselines and enabling transfer of the shared representation to new tasks. The shared features function as off-the-shelf knowledge, offering practical benefits for rapid adaptation to unseen tasks while preserving task-specific performance through private components.

Abstract

Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \url{http://nlp.fudan.edu.cn/data/}

Adversarial Multi-task Learning for Text Classification

TL;DR

This work tackles learning task-invariant shared representations for multi-task text classification by introducing an adversarial shared-private framework with orthogonality constraints. The model uses a shared LSTM plus per-task private encoders, augmented by a task discriminator and gradient reversal to purify the shared space, and an orthogonality loss to reduce redundancy. Evaluated on 16 diverse text classification datasets, the approach achieves state-of-the-art performance, with ASP-MTL outperforming baselines and enabling transfer of the shared representation to new tasks. The shared features function as off-the-shelf knowledge, offering practical benefits for rapid adaptation to unseen tasks while preserving task-specific performance through private components.

Abstract

Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \url{http://nlp.fudan.edu.cn/data/}

Paper Structure

This paper contains 27 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Two sharing schemes for task A and task B. The overlap between two black circles denotes shared space. The blue triangles and boxes represent the task-specific features while the red circles denote the features which can be shared.
  • Figure 2: Two architectures for learning multiple tasks. Yellow and gray boxes represent shared and private LSTM layers respectively.
  • Figure 3: Adversarial shared-private model. Yellow and gray boxes represent shared and private LSTM layers respectively.
  • Figure 4: Two transfer strategies using a pre-trained shared LSTM layer. Yellow box denotes shared feature extractor $E_{s}$ trained by 15 tasks.
  • Figure 5: (a) The change of the predicted sentiment score at different time steps. Y-axis represents the sentiment score, while X-axis represents the input words in chronological order. The darker grey horizontal line gives a border between the positive and negative sentiments. (b) The purple heat map describes the behaviour of neuron $\mathbf{h}^{s}_{18}$ from shared layer of SP-MTL, while the blue one is used to show the behaviour of neuron $\mathbf{h}^{s}_{21}$, which belongs to the shared layer of our model.