Startup Delay Aware Short Video Ordering: Problem, Model, and A Reinforcement Learning based Algorithm
Zhipeng Gao, Chunxi Li, Yongxiang Zhao, Baoxian Zhang
TL;DR
This work tackles startup-delay in short-video delivery by modeling the delivery path with a token-bucket transmission system and formulating an optimal server-side video ordering problem to minimize the maximum startup delay $D = \max_i d_{x_i}$. The authors prove the problem is NP-hard and propose the Partially Shared Actor-Critic (PSAC) reinforcement learning algorithm, which shares embedding and encoder modules between the actor and critic to reduce parameters and accelerate convergence. PSAC learns a video ordering by predicting a list $X$ that minimizes $D$, using a token-bucket based delay calculation to guide policy optimization via policy gradient. Empirical results on a real dataset show substantial improvements, with PSAC achieving up to about 56% reductions in average maximum startup delay compared with baselines and demonstrating robustness to bitrate variations and viewing-time prediction errors, indicating practical impact for server-side scheduling in live short-video platforms.
Abstract
Short video applications have attracted billions of users on the Internet and can satisfy diverse users' fragmented spare time with content-rich and duration-short videos. To achieve fast playback at user side, existing short video systems typically enforce burst transmission of initial segment of each video when being requested for improved quality of user experiences. However, such a way of burst transmissions can cause unexpected large startup delays at user side. This is because users may frequently switch videos when sequentially watching a list of short videos recommended by the server side, which can cause excessive burst transmissions of initial segments of different short videos and thus quickly deplete the network transmission capacity. In this paper, we adopt token bucket to characterize the video transmission path between video server and each user, and accordingly study how to effectively reduce the startup delay of short videos by effectively arranging the viewing order of a video list at the server side. We formulate the optimal video ordering problem for minimizing the maximum video startup delay as a combinatorial optimization problem and prove its NP-hardness. We accordingly propose a Partially Shared Actor Critic reinforcement learning algorithm (PSAC) to learn optimized video ordering strategy. Numerical results based on a real dataset provided by a large-scale short video service provider demonstrate that the proposed PSAC algorithm can significantly reduce the video startup delay compared to baseline algorithms.
