VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars
Vineet Kumar Rakesh, Ahana Bhattacharjee, Soumya Mazumdar, Tapas Samanta, Hemendra Kumar Pandey, Amitabha Das, Sarbajit Pal
TL;DR
This work tackles the limitation of GPU-heavy, data-hungry talking-head generation by proposing Symbolic Vedic Computation, a deterministic, CPU-oriented pipeline that converts speech into a time-aligned phoneme stream $\mathcal{P}$, maps phonemes to a compact viseme set $\mathbb{V}$, and generates smooth mouth trajectories $\mathbf{y}(t)$ using Vedic-inspired blending $\mathbf{y}(t) = (1-\alpha)\mathbf{a} + \alpha\mathbf{c} + \lambda\alpha(1-\alpha)(\mathbf{a} \odot \mathbf{c})$. A lightweight 2D ROI renderer performs landmark-based mouth ROI localization, mouth-bank compositing, and head-motion stabilization to achieve real-time synthesis on commodity CPUs. The approach is evaluated via a reproducible CPU-focused protocol, reporting synchronization accuracy within $\pm 40$ ms, temporal stability, and identity preservation, while benchmarking against CPU-feasible baselines like Wav2Lip. Results indicate acceptable lip-sync quality with substantially lower computational load and latency, enabling practical educational avatars in low-resource or offline environments. The work highlights interpretable, rule-based animation with potential extensibility to additional viseme control and language support, offering a viable alternative to heavy neural THG pipelines for classroom use. All math and algorithmic details are presented with explicit symbolic rules and arithmetic-inspired blending, promoting transparency and deployability in offline educational settings.
Abstract
Talking-head avatars are increasingly adopted in educational technology to deliver content with social presence and improved engagement. However, many recent talking-head generation (THG) methods rely on GPU-centric neural rendering, large training sets, or high-capacity diffusion models, which limits deployment in offline or resource-constrained learning environments. A deterministic and CPU-oriented THG framework is described, termed Symbolic Vedic Computation, that converts speech to a time-aligned phoneme stream, maps phonemes to a compact viseme inventory, and produces smooth viseme trajectories through symbolic coarticulation inspired by Vedic sutra Urdhva Tiryakbhyam. A lightweight 2D renderer performs region-of-interest (ROI) warping and mouth compositing with stabilization to support real-time synthesis on commodity CPUs. Experiments report synchronization accuracy, temporal stability, and identity consistency under CPU-only execution, alongside benchmarking against representative CPU-feasible baselines. Results indicate that acceptable lip-sync quality can be achieved while substantially reducing computational load and latency, supporting practical educational avatars on low-end hardware. GitHub: https://vineetkumarrakesh.github.io/vedicthg
