Steering Large Language Models for Machine Translation Personalization
Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim
TL;DR
This work tackles personalizing translations from large language models in the literary domain using few examples. It compares prompting-based methods with steering approaches and introduces a contrastive sparse autoencoder (SAE) framework that identifies and upweights style-relevant latent features during inference, achieving strong translator-style alignment without sacrificing translation quality. The results show SAE Cont offers a robust balance between style conditioning and fluency, and it can operate with greater efficiency than multi-shot prompting, effectively behaving as an internalized summary of demonstrations. The study also demonstrates that style information is detectable in model activations and that SAE steering closely mirrors the effects of prompting on activations, providing a mechanistic link between prompting and latent-level steering with practical implications for scalable MT personalization.
Abstract
Large language models have simplified the production of personalized translations reflecting predefined stylistic constraints. However, these systems still struggle when stylistic requirements are implicitly represented by a set of examples, such as texts produced by a specific human translator. In this work, we explore various strategies for personalizing automatically generated translations when few examples are available, with a focus on the challenging domain of literary translation. We begin by determining the feasibility of the task and how style information is encoded within model representations. Then, we evaluate various prompting strategies and inference-time interventions for steering model generations towards a personalized style, with a particular focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties. We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency than prompting approaches. We further examine the impact of steering on model activations, finding that layers encoding personalization properties are impacted similarly by prompting and SAE steering, suggesting a similar mechanism at play.
