VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction
Vineet Kumar Rakesh, Soumya Mazumdar, Tapas Samanta, Hemendra Kumar Pandey, Amitabha Das, Sarbajit Pal
TL;DR
The paper tackles real-time video conferencing under severely constrained bandwidth by proposing VineetVC, an adaptive system that blends standard WebRTC transmission with an audio-driven talking-head reconstruction path. A telemetry-guided, hysteresis-based controller switches among Normal, Low-Bitrate, and AI modes, reassigning bitrate from pixel video to compact control and reference updates when needed. Key contributions include the three-mode bandwidth policy, a closed-loop capacity proxy driven by WebRTC statistics, and backend-agnostic talking-head synthesis, with extensive long-run logs demonstrating substantial bandwidth reduction and maintained conversational continuity. The work highlights practical benefits, privacy considerations, and deployment trade-offs, offering a path toward persistent conferencing in challenging networks and outlining future work to enhance robustness and multi-speaker scenarios.
Abstract
Intense bandwidth depletion within consumer and constrained networks has the potential to undermine the stability of real-time video conferencing: encoder rate management becomes saturated, packet loss escalates, frame rates deteriorate, and end-to-end latency significantly increases. This work delineates an adaptive conferencing system that integrates WebRTC media delivery with a supplementary audio-driven talking-head reconstruction pathway and telemetry-driven mode regulation. The system consists of a WebSocket signaling service, an optional SFU for multi-party transmission, a browser client capable of real-time WebRTC statistics extraction and CSV telemetry export, and an AI REST service that processes a reference face image and recorded audio to produce a synthesized MP4; the browser can substitute its outbound camera track with the synthesized stream with a median bandwidth of 32.80 kbps. The solution incorporates a bandwidth-mode switching strategy and a client-side mode-state logger.
