Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation
Sofia Jamil, Kotla Sai Charan, Sriparna Saha, Koustava Goswami, Joseph K J
TL;DR
This work tackles translating morphologically rich Indian poetry and generating semantically meaningful images by introducing the Translation and Image Generation (TAI) framework. TAI combines an Odds Ratio Preference Optimization (ORPO)–guided translation module with a semantic-graph–driven image-prompt construction pipeline, enabling diffusion-based image synthesis that respects metaphor, culture, and context. The Morphologically Rich Indian Language Poems (MorphoVerse) dataset, comprising 1,570 poems across 21 languages, supports this study and addresses resource scarcity in Indian poetry. Experimental results show that TAI surpasses strong baselines in both translation quality and image alignment, with human evaluations confirming cultural fidelity and semantic accuracy, underscoring the framework’s potential to broaden access to Indian poetic heritage. This approach advances cross-language poetry understanding and culturally informed image generation, contributing to education and reduced inequality by making Indian-language poetry more accessible globally.
Abstract
Indian poetry, known for its linguistic complexity and deep cultural resonance, has a rich and varied heritage spanning thousands of years. However, its layered meanings, cultural allusions, and sophisticated grammatical constructions often pose challenges for comprehension, especially for non-native speakers or readers unfamiliar with its context and language. Despite its cultural significance, existing works on poetry have largely overlooked Indian language poems. In this paper, we propose the Translation and Image Generation (TAI) framework, leveraging Large Language Models (LLMs) and Latent Diffusion Models through appropriate prompt tuning. Our framework supports the United Nations Sustainable Development Goals of Quality Education (SDG 4) and Reduced Inequalities (SDG 10) by enhancing the accessibility of culturally rich Indian-language poetry to a global audience. It includes (1) a translation module that uses an Odds Ratio Preference Alignment Algorithm to accurately translate morphologically rich poetry into English, and (2) an image generation module that employs a semantic graph to capture tokens, dependencies, and semantic relationships between metaphors and their meanings, to create visually meaningful representations of Indian poems. Our comprehensive experimental evaluation, including both human and quantitative assessments, demonstrates the superiority of TAI Diffusion in poem image generation tasks, outperforming strong baselines. To further address the scarcity of resources for Indian-language poetry, we introduce the Morphologically Rich Indian Language Poems MorphoVerse Dataset, comprising 1,570 poems across 21 low-resource Indian languages. By addressing the gap in poetry translation and visual comprehension, this work aims to broaden accessibility and enrich the reader's experience.
