Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis
Ahan Prasannakumar Shetty
TL;DR
This paper addresses the challenge of evaluating English-Hindi MT across general and domain-specific content. It jointly uses lexical and ML-based automatic metrics on a large 18k-parallel corpus and a 400-question FAQ dataset. Findings show Google Translate generally achieves the best translation quality, with IndicTrans2 competitive, while resource-scarce models like NLLB-200 and OPUS-MT lag in some directions. The work provides guidance for deploying MT in government and public-services contexts and highlights evaluation strategies including back-translation to assess robustness.
Abstract
Machine translation has become a critical tool in bridging linguistic gaps, especially between languages as diverse as English and Hindi. This paper comprehensively evaluates various machine translation models for translating between English and Hindi. We assess the performance of these models using a diverse set of automatic evaluation metrics, both lexical and machine learning-based metrics. Our evaluation leverages an 18000+ corpus of English Hindi parallel dataset and a custom FAQ dataset comprising questions from government websites. The study aims to provide insights into the effectiveness of different machine translation approaches in handling both general and specialized language domains. Results indicate varying performance levels across different metrics, highlighting strengths and areas for improvement in current translation systems.
