Table of Contents
Fetching ...

Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts

Weisi Liu, Guangzeng Han, Xiaolei Huang

TL;DR

This work treats time as a set of evolving domains to study temporal shifts in multilingual text classification. It introduces Mixture of Temporal Experts (MoTE), comprising a clustering-based shift evaluator and a temporal router network to adapt classifiers across time domains. MoTE demonstrates improved generalization and fairness over baselines on both short, informal reviews and long legal documents across multiple languages, with notable gains in imbalanced and low-resource settings. The findings underscore the importance of time-aware, multilingual modeling and provide practical guidance for deploying classifiers in temporally shifting, diverse linguistic contexts.

Abstract

Time is implicitly embedded in classification process: classifiers are usually built on existing data while to be applied on future data whose distributions (e.g., label and token) may change. However, existing state-of-the-art classification models merely consider the temporal variations and primarily focus on English corpora, which leaves temporal studies less explored, let alone under multilingual settings. In this study, we fill the gap by treating time as domains (e.g., 2024 vs. 2025), examining temporal effects, and developing a domain adaptation framework to generalize classifiers over time on multiple languages. Our framework proposes Mixture of Temporal Experts (MoTE) to leverage both semantic and data distributional shifts to learn and adapt temporal trends into classification models. Our analysis shows classification performance varies over time across different languages, and we experimentally demonstrate that MoTE can enhance classifier generalizability over temporal data shifts. Our study provides analytic insights and addresses the need for time-aware models that perform robustly in multilingual scenarios.

Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts

TL;DR

This work treats time as a set of evolving domains to study temporal shifts in multilingual text classification. It introduces Mixture of Temporal Experts (MoTE), comprising a clustering-based shift evaluator and a temporal router network to adapt classifiers across time domains. MoTE demonstrates improved generalization and fairness over baselines on both short, informal reviews and long legal documents across multiple languages, with notable gains in imbalanced and low-resource settings. The findings underscore the importance of time-aware, multilingual modeling and provide practical guidance for deploying classifiers in temporally shifting, diverse linguistic contexts.

Abstract

Time is implicitly embedded in classification process: classifiers are usually built on existing data while to be applied on future data whose distributions (e.g., label and token) may change. However, existing state-of-the-art classification models merely consider the temporal variations and primarily focus on English corpora, which leaves temporal studies less explored, let alone under multilingual settings. In this study, we fill the gap by treating time as domains (e.g., 2024 vs. 2025), examining temporal effects, and developing a domain adaptation framework to generalize classifiers over time on multiple languages. Our framework proposes Mixture of Temporal Experts (MoTE) to leverage both semantic and data distributional shifts to learn and adapt temporal trends into classification models. Our analysis shows classification performance varies over time across different languages, and we experimentally demonstrate that MoTE can enhance classifier generalizability over temporal data shifts. Our study provides analytic insights and addresses the need for time-aware models that perform robustly in multilingual scenarios.

Paper Structure

This paper contains 45 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Visualizations of temporal effects (performance variations by macro-F1 and AUC) of the cross-domain tests on four languages. Darker blue indicate larger performance decrease.
  • Figure 2: The MoTE method overview. $D^t$ is in the source time domain data that has true labels, $D$ is temporal ordered data excluding target domain, and $D^{target}$ is the target time domain data without labels. Blue and grey lines indicate the training process, and pink line represent predicting data flow in the target time domain.