Table of Contents
Fetching ...

Steering Large Language Models for Machine Translation Personalization

Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim

TL;DR

This work tackles personalizing translations from large language models in the literary domain using few examples. It compares prompting-based methods with steering approaches and introduces a contrastive sparse autoencoder (SAE) framework that identifies and upweights style-relevant latent features during inference, achieving strong translator-style alignment without sacrificing translation quality. The results show SAE Cont offers a robust balance between style conditioning and fluency, and it can operate with greater efficiency than multi-shot prompting, effectively behaving as an internalized summary of demonstrations. The study also demonstrates that style information is detectable in model activations and that SAE steering closely mirrors the effects of prompting on activations, providing a mechanistic link between prompting and latent-level steering with practical implications for scalable MT personalization.

Abstract

Large language models have simplified the production of personalized translations reflecting predefined stylistic constraints. However, these systems still struggle when stylistic requirements are implicitly represented by a set of examples, such as texts produced by a specific human translator. In this work, we explore various strategies for personalizing automatically generated translations when few examples are available, with a focus on the challenging domain of literary translation. We begin by determining the feasibility of the task and how style information is encoded within model representations. Then, we evaluate various prompting strategies and inference-time interventions for steering model generations towards a personalized style, with a particular focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties. We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency than prompting approaches. We further examine the impact of steering on model activations, finding that layers encoding personalization properties are impacted similarly by prompting and SAE steering, suggesting a similar mechanism at play.

Steering Large Language Models for Machine Translation Personalization

TL;DR

This work tackles personalizing translations from large language models in the literary domain using few examples. It compares prompting-based methods with steering approaches and introduces a contrastive sparse autoencoder (SAE) framework that identifies and upweights style-relevant latent features during inference, achieving strong translator-style alignment without sacrificing translation quality. The results show SAE Cont offers a robust balance between style conditioning and fluency, and it can operate with greater efficiency than multi-shot prompting, effectively behaving as an internalized summary of demonstrations. The study also demonstrates that style information is detectable in model activations and that SAE steering closely mirrors the effects of prompting on activations, providing a mechanistic link between prompting and latent-level steering with practical implications for scalable MT personalization.

Abstract

Large language models have simplified the production of personalized translations reflecting predefined stylistic constraints. However, these systems still struggle when stylistic requirements are implicitly represented by a set of examples, such as texts produced by a specific human translator. In this work, we explore various strategies for personalizing automatically generated translations when few examples are available, with a focus on the challenging domain of literary translation. We begin by determining the feasibility of the task and how style information is encoded within model representations. Then, we evaluate various prompting strategies and inference-time interventions for steering model generations towards a personalized style, with a particular focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties. We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency than prompting approaches. We further examine the impact of steering on model activations, finding that layers encoding personalization properties are impacted similarly by prompting and SAE steering, suggesting a similar mechanism at play.

Paper Structure

This paper contains 43 sections, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: We compare prompt-based approaches with steering techniques intervening on model internals for personalizing MT outputs in literary machine translation. We use MT quality metrics and style classifiers to quantify the impact of steering on output fluency and personalization accuracy.
  • Figure 2: Probing classifier performance on the human translation detection task across Gemma 2 2B (top) and 9B (bottom) layers. Activations in intermediate layers are found to capture translation style information with high precision.
  • Figure 3: Effect of various steering intensity $\alpha$ on style accuracy and translation quality for Gemma 2 2B. Top:P accuracy for SAE Cont. and prompting baselines (MS, ZS-Exp and ZS). Bottom: H accuracy for high $\alpha$ showing a steep drop in translation quality while style accuracy increases.
  • Figure 4: Personalization accuracy P (top) and inference speed (Tokens/s) (bottom) across in-context demonstration counts, using Gemma-2 9B for Russian $\rightarrow$ English translation. More results in \ref{['app:full-ICL']}.
  • Figure 5: Complete results when comparing the MS approach to our SAE Cont.$_{M\leftrightarrow H}$ for the Gemma models (2B and 9B) on the largest novels, evaluated at the paragraph level, in Russian (RU) and French (FR).
  • ...and 3 more figures