Table of Contents
Fetching ...

A 3D Framework for Improving Low-Latency Multi-Channel Live Streaming

Aizierjiang Aiersilan, Zhiqiang Wang

TL;DR

This paper tackles the challenge of delivering low-latency, synchronized multi-channel live streaming under variable network conditions by leveraging a Unity 3D–based framework that maps multiple camera feeds onto virtual canvases in a shared 3D scene and captures them with an in-world camera to produce a single consolidated stream. The approach emphasizes modularity, low latency, and spatial awareness, enabling real-time user interaction and scalable multi-channel handling while supporting VR/AR/MR contexts. Key findings show a latency reduction of up to 68.7% compared with baselines, consistent latency across channel counts, zero synchronization offset due to consolidation, and robust scalability up to 50 devices, albeit with a trade-off in video quality as channels increase. Practically, the framework offers a flexible, open-source solution for low-latency multi-channel streaming applicable to virtual events, remote collaboration, and other immersive applications, with broad protocol compatibility and a concrete theoretical model for data transmission.

Abstract

The advent of 5G has driven the demand for high-quality, low-latency live streaming. However, challenges such as managing the increased data volume, ensuring synchronization across multiple streams, and maintaining consistent quality under varying network conditions persist, particularly in real-time video streaming. To address these issues, we propose a novel framework that leverages 3D virtual environments within game engines (e.g., Unity 3D) to optimize multi-channel live streaming. Our approach consolidates multi-camera video data into a single stream using multiple virtual 3D canvases, significantly increasing channel amounts while reducing latency and enhancing user flexibility. For demonstration of our approach, we utilize the Unity 3D engine to integrate multiple video inputs into a single-channel stream, supporting one-to-many broadcasting, one-to-one video calling, and real-time control of video channels. By mapping video data onto a world-space canvas and capturing it via an in-world camera, we minimize redundant data transmission, achieving efficient, low-latency streaming. Our results demonstrate that this method outperforms some existing multi-channel live streaming solutions in both latency reduction and user interaction responsiveness improvement. Our live video streaming system affiliated with this paper is also open-source at https://github.com/Aizierjiang/LiveStreaming.

A 3D Framework for Improving Low-Latency Multi-Channel Live Streaming

TL;DR

This paper tackles the challenge of delivering low-latency, synchronized multi-channel live streaming under variable network conditions by leveraging a Unity 3D–based framework that maps multiple camera feeds onto virtual canvases in a shared 3D scene and captures them with an in-world camera to produce a single consolidated stream. The approach emphasizes modularity, low latency, and spatial awareness, enabling real-time user interaction and scalable multi-channel handling while supporting VR/AR/MR contexts. Key findings show a latency reduction of up to 68.7% compared with baselines, consistent latency across channel counts, zero synchronization offset due to consolidation, and robust scalability up to 50 devices, albeit with a trade-off in video quality as channels increase. Practically, the framework offers a flexible, open-source solution for low-latency multi-channel streaming applicable to virtual events, remote collaboration, and other immersive applications, with broad protocol compatibility and a concrete theoretical model for data transmission.

Abstract

The advent of 5G has driven the demand for high-quality, low-latency live streaming. However, challenges such as managing the increased data volume, ensuring synchronization across multiple streams, and maintaining consistent quality under varying network conditions persist, particularly in real-time video streaming. To address these issues, we propose a novel framework that leverages 3D virtual environments within game engines (e.g., Unity 3D) to optimize multi-channel live streaming. Our approach consolidates multi-camera video data into a single stream using multiple virtual 3D canvases, significantly increasing channel amounts while reducing latency and enhancing user flexibility. For demonstration of our approach, we utilize the Unity 3D engine to integrate multiple video inputs into a single-channel stream, supporting one-to-many broadcasting, one-to-one video calling, and real-time control of video channels. By mapping video data onto a world-space canvas and capturing it via an in-world camera, we minimize redundant data transmission, achieving efficient, low-latency streaming. Our results demonstrate that this method outperforms some existing multi-channel live streaming solutions in both latency reduction and user interaction responsiveness improvement. Our live video streaming system affiliated with this paper is also open-source at https://github.com/Aizierjiang/LiveStreaming.

Paper Structure

This paper contains 18 sections, 3 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Our framework leverages multiple input devices to create channels for clients. Video content is seamlessly distributed across canvases in a virtual 3D environment, captured by sub-cameras and then consolidated by a main camera. Users' interactions with the cameras are mapped to an interactive control group, ensuring a responsive experience. The final stream is transmitted to clients via RTSP. To acquire video data from external cameras, we utilize the UVC standard (USB Video Class), which enables plug-and-play functionality across various operating systems.
  • Figure 2: Latency measured over a 30-minute period with different numbers of input camera devices functioning as separate channels for live video streaming.
  • Figure 3: Evaluation of system performance as the number of input devices increases to 50. The average latency is 230.29 ms. GPU consumption remains stable with minimal fluctuations, while CPU consumption rises progressively with more input devices. The GPU consumption stays almost still because the online-rendering is conducted if only the video is displayed. This indicates efficient GPU resource management and increasing CPU demands. Testing was conducted on machines with an NVIDIA GeForce GTX 1050 GPU and an Intel Core i7-7700HQ CPU.
  • Figure 4: Nine modules were selected for evaluation, with their response times predominantly ranging from 50 ms to 1300 ms. The mean response time, represented by a dashed line, is approximately 600 ms. Given the significant fluctuations observed in the recorded data, Gaussian smoothing ito2000gaussian was applied to reduce noise and variability, utilizing a parameter ($\sigma$) of 1 for visualization purposes.