Table of Contents
Fetching ...

Synthesizing the Virtual Advocate: A Multi-Persona Speech Generation Framework for Diverse Linguistic Jurisdictions in Indic Languages

Aniket Deroy

TL;DR

The paper tackles generating persuasive, persona-aligned legal speech across five Indic languages by coupling LLM-driven argument generation with multi-persona TTS using Gemini 2.5 Pro and Flash. It introduces a formal framework where advocate profiles $p_i \in P$ produce text via $T_i = f_{LLM}(p_i, C)$ and audio via $S_{audio} = \Phi_M(T_i, \theta_i, \sigma_i)$, enabling controlled prosody and language adaptation. Human evaluation across languages reveals strong procedural clarity, with Hindi, Tamil, and Telugu performing best on safety, professionalism, and directiveness, while Authenticity and Expressiveness lag, indicating an authenticity gap in persuasive advocacy. The results substantiate multilingual TTS’ readiness for procedural legal tasks but highlight the need for enhanced emotive modulation and finer phonological handling in Bengali and Gujarati, pointing to concrete avenues for future refinement.

Abstract

Legal advocacy requires a unique combination of authoritative tone, rhythmic pausing for emphasis, and emotional intelligence. This study investigates the performance of the Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS models in generating synthetic courtroom speeches across five Indic languages: Tamil, Telugu, Bengali, Hindi, and Gujarati. We propose a prompting framework that utilizes Gemini 2.5s native support for 5 languages and its context-aware pacing to produce distinct advocate personas. The evolution of Large Language Models (LLMs) has shifted the focus of TexttoSpeech (TTS) technology from basic intelligibility to context-aware, expressive synthesis. In the legal domain, synthetic speech must convey authority and a specific professional persona a task that becomes significantly more complex in the linguistically diverse landscape of India. The models exhibit a "monotone authority," excelling at procedural information delivery but struggling with the dynamic vocal modulation and emotive gravitas required for persuasive advocacy. Performance dips in Bengali and Gujarati further highlight phonological frontiers for future refinement. This research underscores the readiness of multilingual TTS for procedural legal tasks while identifying the remaining challenges in replicating the persuasive artistry of human legal discourse. The code is available at-https://github.com/naturenurtureelite/Synthesizing-the-Virtual-Advocate/tree/main

Synthesizing the Virtual Advocate: A Multi-Persona Speech Generation Framework for Diverse Linguistic Jurisdictions in Indic Languages

TL;DR

The paper tackles generating persuasive, persona-aligned legal speech across five Indic languages by coupling LLM-driven argument generation with multi-persona TTS using Gemini 2.5 Pro and Flash. It introduces a formal framework where advocate profiles produce text via and audio via , enabling controlled prosody and language adaptation. Human evaluation across languages reveals strong procedural clarity, with Hindi, Tamil, and Telugu performing best on safety, professionalism, and directiveness, while Authenticity and Expressiveness lag, indicating an authenticity gap in persuasive advocacy. The results substantiate multilingual TTS’ readiness for procedural legal tasks but highlight the need for enhanced emotive modulation and finer phonological handling in Bengali and Gujarati, pointing to concrete avenues for future refinement.

Abstract

Legal advocacy requires a unique combination of authoritative tone, rhythmic pausing for emphasis, and emotional intelligence. This study investigates the performance of the Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS models in generating synthetic courtroom speeches across five Indic languages: Tamil, Telugu, Bengali, Hindi, and Gujarati. We propose a prompting framework that utilizes Gemini 2.5s native support for 5 languages and its context-aware pacing to produce distinct advocate personas. The evolution of Large Language Models (LLMs) has shifted the focus of TexttoSpeech (TTS) technology from basic intelligibility to context-aware, expressive synthesis. In the legal domain, synthetic speech must convey authority and a specific professional persona a task that becomes significantly more complex in the linguistically diverse landscape of India. The models exhibit a "monotone authority," excelling at procedural information delivery but struggling with the dynamic vocal modulation and emotive gravitas required for persuasive advocacy. Performance dips in Bengali and Gujarati further highlight phonological frontiers for future refinement. This research underscores the readiness of multilingual TTS for procedural legal tasks while identifying the remaining challenges in replicating the persuasive artistry of human legal discourse. The code is available at-https://github.com/naturenurtureelite/Synthesizing-the-Virtual-Advocate/tree/main
Paper Structure (13 sections, 4 equations, 9 tables)