Table of Contents
Fetching ...

Disentangling Singlish Discourse Particles with Task-Driven Representation

Linus Tze En Foo, Lynnette Hui Xian Ng

TL;DR

This work offers a preliminary effort to disentangle the Singlish discourse particles lah, meh, hor with task-driven representation learning, and provides a computational method to understanding Singlish discourse particles.

Abstract

Singlish, or formally Colloquial Singapore English, is an English-based creole language originating from the SouthEast Asian country Singapore. The language contains influences from Sinitic languages such as Chinese dialects, Malay, Tamil and so forth. A fundamental task to understanding Singlish is to first understand the pragmatic functions of its discourse particles, upon which Singlish relies heavily to convey meaning. This work offers a preliminary effort to disentangle the Singlish discourse particles (lah, meh and hor) with task-driven representation learning. After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage.

Disentangling Singlish Discourse Particles with Task-Driven Representation

TL;DR

This work offers a preliminary effort to disentangle the Singlish discourse particles lah, meh, hor with task-driven representation learning, and provides a computational method to understanding Singlish discourse particles.

Abstract

Singlish, or formally Colloquial Singapore English, is an English-based creole language originating from the SouthEast Asian country Singapore. The language contains influences from Sinitic languages such as Chinese dialects, Malay, Tamil and so forth. A fundamental task to understanding Singlish is to first understand the pragmatic functions of its discourse particles, upon which Singlish relies heavily to convey meaning. This work offers a preliminary effort to disentangle the Singlish discourse particles (lah, meh and hor) with task-driven representation learning. After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage.
Paper Structure (27 sections, 1 equation, 4 figures, 6 tables)

This paper contains 27 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Illustration of model architecture for P-Pred task. The output representation of the [CLS] token from SingBERT is passed into a fully connected classifier to predict the probability for one of the three discourse classes.
  • Figure 2: Architecture for obtaining word representation using LIR-sentence. LIR-word uses a similar architecture except it uses particle word embedding instead of sentence embedding.
  • Figure 3: Representation for baseline model SingBERT. Blue - lah, orange - meh, green - hor.
  • Figure 4: Clusters of particle representations. Blue - lah, orange - meh, green - hor.