Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs
Pratik Rakesh Singh, Mohammadi Zaki, Pankaj Wasnik
TL;DR
The paper addresses the entertainment translation problem, where accurate, engaging translations must leverage running context and stylistic cues. It proposes CASAT, a context- and style-aware translation framework that combines a Context Retrieval-Advanced RAG module with a Domain Adaptation Module to produce a time-varying prompt $p_t$ for LLM-based translation, and it segments content into adaptive sessions to capture mood and genre. The approach is language- and LLM-agnostic and relies on offline context extraction from plot summaries stored in a vector database, enabling in-context learning for high-quality translations. Experimental results on Indian-language directions (En–Hi, En–Ben, En–Tel) show consistent improvements in COMET scores and win-ratios across multiple LLMs, outperforming traditional MT baselines in entertainment-specific translation tasks. The work demonstrates practical impact for dubbing and subtitling in diverse languages and offers a path toward online, context-sensitive, culturally aware translation systems.
Abstract
We address the challenging task of neural machine translation (NMT) in the entertainment domain, where the objective is to automatically translate a given dialogue from a source language content to a target language. This task has various applications, particularly in automatic dubbing, subtitling, and other content localization tasks, enabling source content to reach a wider audience. Traditional NMT systems typically translate individual sentences in isolation, without facilitating knowledge transfer of crucial elements such as the context and style from previously encountered sentences. In this work, we emphasize the significance of these fundamental aspects in producing pertinent and captivating translations. We demonstrate their significance through several examples and propose a novel framework for entertainment translation, which, to our knowledge, is the first of its kind. Furthermore, we introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations. Our method is both language and LLM-agnostic, making it a general-purpose tool. We demonstrate the effectiveness of our algorithm through various numerical studies and observe significant improvement in the COMET scores over various state-of-the-art LLMs. Moreover, our proposed method consistently outperforms baseline LLMs in terms of win-ratio.
