Table of Contents
Fetching ...

Encoding Time and Energy Model for SVT-AV1 based on Video Complexity

Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

TL;DR

The paper addresses the rising energy footprint of video encoding by modeling SVT-AV1 encoding time and energy and by incorporating content-aware descriptors. It presents a high-level energy model where $E_{\text{enc,kpix}} = E_0 + P \cdot t_{\text{enc,kpix}} \cdot \frac{W H}{1000} \cdot n_{\text{frames}}$ and a time model $\hat{t}_{\text{enc,kpix}} = \mathcal{C}^{\xi} n_{\text{intra}}^{\delta} \frac{1}{\text{CRF}} p^{\alpha} e^{\beta p + \gamma} + t_0$, with a content factor $\mathcal{C}$ dependent on spatial and temporal complexity $\mathcal{C} = f_{n,s}(\mathcal{C}_S) f_{n,t}(\mathcal{C}_T)$. By evaluating various content descriptors (e.g., SI, VCA, TI, optical flow) and using ultrafast encoding as a separate factor, the authors demonstrate a robust energy-time correlation. Empirical results show energy prediction errors as low as $2.93\%$ when fitting energy to measured time, and around $19.6$–$20.9\%$ when incorporating content information into time and energy predictions, underscoring the practical value for content-aware optimization. The findings provide a foundation for time-based energy planning in SVT-AV1 and outline future work on multicore extensions and broader energy components like transmission and decoding.

Abstract

The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption, modeling encoding energy for a wide range of encoding parameters is crucial. We propose an encoding time and energy model for SVT-AV1 based on empirical relations between the encoding time and video parameters as well as encoder configurations. Furthermore, we model the influence of video content by established content descriptors such as spatial and temporal information. We then use the predicted encoding time to estimate the required energy demand and achieve a prediction error of 19.6 % for encoding time and 20.9 % for encoding energy.

Encoding Time and Energy Model for SVT-AV1 based on Video Complexity

TL;DR

The paper addresses the rising energy footprint of video encoding by modeling SVT-AV1 encoding time and energy and by incorporating content-aware descriptors. It presents a high-level energy model where and a time model , with a content factor dependent on spatial and temporal complexity . By evaluating various content descriptors (e.g., SI, VCA, TI, optical flow) and using ultrafast encoding as a separate factor, the authors demonstrate a robust energy-time correlation. Empirical results show energy prediction errors as low as when fitting energy to measured time, and around when incorporating content information into time and energy predictions, underscoring the practical value for content-aware optimization. The findings provide a foundation for time-based energy planning in SVT-AV1 and outline future work on multicore extensions and broader energy components like transmission and decoding.

Abstract

The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption, modeling encoding energy for a wide range of encoding parameters is crucial. We propose an encoding time and energy model for SVT-AV1 based on empirical relations between the encoding time and video parameters as well as encoder configurations. Furthermore, we model the influence of video content by established content descriptors such as spatial and temporal information. We then use the predicted encoding time to estimate the required energy demand and achieve a prediction error of 19.6 % for encoding time and 20.9 % for encoding energy.
Paper Structure (12 sections, 16 equations, 1 figure, 2 tables)

This paper contains 12 sections, 16 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Optimal content factors for evaluated sequences and their standard deviations indicated by black error bars. The higher the value, the more time is required for encoding the associated sequence. Larger deviations in content complexity factors are observed for high-complexity sequences.