Table of Contents
Fetching ...

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Zhilin Yang, Ruslan Salakhutdinov, William Cohen

TL;DR

The paper introduces a deep hierarchical GRU architecture with a CRF layer for end-to-end sequence tagging that operates without feature engineering. It extends the model to multi-task and cross-lingual joint training by sharing architecture and parameters, enabling learning across tasks and languages from scratch. The approach achieves state-of-the-art results on POS tagging, chunking, and NER across multiple languages and demonstrates performance gains through joint training, especially in low-resource settings. This work highlights the value of morphological and contextual sharing in multilingual sequence tagging and points to future work leveraging parallel data for cross-lingual semantics.

Abstract

We present a deep hierarchical recurrent neural network for sequence tagging. Given a sequence of words, our model employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags. Our model is task independent, language independent, and feature engineering free. We further extend our model to multi-task and cross-lingual joint training by sharing the architecture and parameters. Our model achieves state-of-the-art results in multiple languages on several benchmark tasks including POS tagging, chunking, and NER. We also demonstrate that multi-task and cross-lingual joint training can improve the performance in various cases.

Multi-Task Cross-Lingual Sequence Tagging from Scratch

TL;DR

The paper introduces a deep hierarchical GRU architecture with a CRF layer for end-to-end sequence tagging that operates without feature engineering. It extends the model to multi-task and cross-lingual joint training by sharing architecture and parameters, enabling learning across tasks and languages from scratch. The approach achieves state-of-the-art results on POS tagging, chunking, and NER across multiple languages and demonstrates performance gains through joint training, especially in low-resource settings. This work highlights the value of morphological and contextual sharing in multilingual sequence tagging and points to future work leveraging parallel data for cross-lingual semantics.

Abstract

We present a deep hierarchical recurrent neural network for sequence tagging. Given a sequence of words, our model employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags. Our model is task independent, language independent, and feature engineering free. We further extend our model to multi-task and cross-lingual joint training by sharing the architecture and parameters. Our model achieves state-of-the-art results in multiple languages on several benchmark tasks including POS tagging, chunking, and NER. We also demonstrate that multi-task and cross-lingual joint training can improve the performance in various cases.

Paper Structure

This paper contains 16 sections, 6 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: The architecture of our hierarchical GRU network with CRF, when $L^c = L^w = 1$ (only one layer for word-level and character-level GRUs respectively). We only display the character-level GRU for the word Mike and omit others.
  • Figure 2: Network architectures for multi-task and cross-lingual joint training. Red boxes indicate shared architecture and parameters. Blue boxes are task/language specific components trained separately. Eng, Span, Char, and Emb refer to English, Spanish, Character and Embeddings.
  • Figure 3: 2-dimensional t-SNE visualization of the character-level GRU output for country names in English and Spanish. Black words are English and red ones are Spanish. Note that all corresponding pairs are nearest neighbors in the original embedding space.