Table of Contents
Fetching ...

TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

Toshiki Nakai, Ravi Kiran Chikkala, Lena Sophie Oberkircher, Nicholas Jennings, Natalia Skachkova, Tatiana Anikina, Jesujoba Oluwadara Alabi

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of translating from low-resource languages (LRLs) to high-resource languages (HRLs) in a setting with limited data. It introduces TRepLiNa, a layer-wise alignment method that combines Centered Kernel Alignment (CKA) with REPINA stability to align mid-layer representations in a decoder-only LLM (Aya-23 8B) and improve LRL→HRL translation, especially under zero-shot, few-shot, and small-data fine-tuning. Across Mundari, Santali, Bhili, and Gondi with Hindi/English pivots, TRepLiNa demonstrates robust gains at mid-layers (around 10–15), often outperforming CKA alone or REPINA-only, and yields state-of-the-art-like results on several targets under the MMLoSo benchmark. The work provides practical guidelines on when and where to apply layer-wise alignment in low-resource MT and highlights potential limitations and directions for extending the approach to other model families and data regimes.

Abstract

The 2025 Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo) Language Challenge addresses one of India's most pressing linguistic gaps: the lack of resources for its diverse low-resource languages (LRLs). In this study, we investigate whether enforcing cross-lingual similarity in specific internal layers of a decoder-only multilingual large language model (LLM) can improve translation quality from LRL to high-resource language (HRL). Specifically, we combine Centered Kernel Alignment (CKA), a similarity metric that encourages representations of different languages to align, with REPINA, a regularization method that constrains parameter updates to remain close to the pretrained model, into a joint method we call TRepLiNa. In this research project, we experiment with zero-shot, few-shot, and fine-tuning settings using Aya-23 8B with QLoRA across MMLoSo shared task language pairs (Mundari, Santali, Bhili) with Hindi/English pivots. Our results show that aligning mid-level layers using TRepLiNa (CKA+REPINA) is a low-cost, practical approach to improving LRL translation, especially in data-scarce settings.

TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of translating from low-resource languages (LRLs) to high-resource languages (HRLs) in a setting with limited data. It introduces TRepLiNa, a layer-wise alignment method that combines Centered Kernel Alignment (CKA) with REPINA stability to align mid-layer representations in a decoder-only LLM (Aya-23 8B) and improve LRL→HRL translation, especially under zero-shot, few-shot, and small-data fine-tuning. Across Mundari, Santali, Bhili, and Gondi with Hindi/English pivots, TRepLiNa demonstrates robust gains at mid-layers (around 10–15), often outperforming CKA alone or REPINA-only, and yields state-of-the-art-like results on several targets under the MMLoSo benchmark. The work provides practical guidelines on when and where to apply layer-wise alignment in low-resource MT and highlights potential limitations and directions for extending the approach to other model families and data regimes.

Abstract

The 2025 Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo) Language Challenge addresses one of India's most pressing linguistic gaps: the lack of resources for its diverse low-resource languages (LRLs). In this study, we investigate whether enforcing cross-lingual similarity in specific internal layers of a decoder-only multilingual large language model (LLM) can improve translation quality from LRL to high-resource language (HRL). Specifically, we combine Centered Kernel Alignment (CKA), a similarity metric that encourages representations of different languages to align, with REPINA, a regularization method that constrains parameter updates to remain close to the pretrained model, into a joint method we call TRepLiNa. In this research project, we experiment with zero-shot, few-shot, and fine-tuning settings using Aya-23 8B with QLoRA across MMLoSo shared task language pairs (Mundari, Santali, Bhili) with Hindi/English pivots. Our results show that aligning mid-level layers using TRepLiNa (CKA+REPINA) is a low-cost, practical approach to improving LRL translation, especially in data-scarce settings.

Paper Structure

This paper contains 47 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Proposed alignment architecture. Under CKA-only, both HRL and LRL representations drift toward each other, potentially distorting HRL features. By contrast, TRepLiNa constrains HRL representations while guiding LRL representations toward them, achieving targeted alignment without degrading HRL quality. Here, $m$ and $n$ denote the number of transformer blocks before and after the target alignment layer, respectively.
  • Figure 2: Comparison of ($0.6 \times$ BLEU $+ 0.4 \times$ ChrF) across layers for CKA, REPINA, NoAlign and TRepLiNa.
  • Figure 3: Zero-shot prompt
  • Figure 4: Few-shot prompt
  • Figure 5: Comparison of $(0.6 \times \mathrm{BLEU} + 0.4 \times \mathrm{ChrF})$ across layers for CKA and TRepLiNa on Santali$\rightarrow$English (1k rows, 1 epoch). Dashed lines indicate each method's baseline.
  • ...and 1 more figures