Multi-Task Cross-Lingual Sequence Tagging from Scratch

Zhilin Yang; Ruslan Salakhutdinov; William Cohen

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Zhilin Yang, Ruslan Salakhutdinov, William Cohen

TL;DR

The paper introduces a deep hierarchical GRU architecture with a CRF layer for end-to-end sequence tagging that operates without feature engineering. It extends the model to multi-task and cross-lingual joint training by sharing architecture and parameters, enabling learning across tasks and languages from scratch. The approach achieves state-of-the-art results on POS tagging, chunking, and NER across multiple languages and demonstrates performance gains through joint training, especially in low-resource settings. This work highlights the value of morphological and contextual sharing in multilingual sequence tagging and points to future work leveraging parallel data for cross-lingual semantics.

Abstract

We present a deep hierarchical recurrent neural network for sequence tagging. Given a sequence of words, our model employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags. Our model is task independent, language independent, and feature engineering free. We further extend our model to multi-task and cross-lingual joint training by sharing the architecture and parameters. Our model achieves state-of-the-art results in multiple languages on several benchmark tasks including POS tagging, chunking, and NER. We also demonstrate that multi-task and cross-lingual joint training can improve the performance in various cases.

Multi-Task Cross-Lingual Sequence Tagging from Scratch

TL;DR

Abstract

Multi-Task Cross-Lingual Sequence Tagging from Scratch

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)