Table of Contents
Fetching ...

USTC-TD: A Test Dataset and Benchmark for Image and Video Coding in 2020s

Zhuoyuan Li, Junqi Liao, Chuanbo Tang, Haotian Zhang, Yuqi Li, Yifan Bian, Xihua Sheng, Xinmin Feng, Yao Li, Changsheng Gao, Li Li, Dong Liu, Feng Wu

TL;DR

USTC-TD introduces a diverse image/video coding test dataset comprising 40 4K images and 10 1080p videos captured with two Nikon cameras to cover a wide range of content factors. The authors provide thorough feature analyses and compare against existing datasets, demonstrating improved coverage of spatial, colorfulness, lightness, and temporal factors. They benchmark both classic standardized codecs and learned compression methods using objective metrics (PSNR, MS-SSIM, VMAF) and subjective MOS, revealing content-dependent advantages for learned models and robust performance for traditional codecs in certain scenarios. The dataset and baselines are released online to enable robust benchmarking, guide standardization efforts, and stimulate further research in compression-aware dataset design and evaluation.

Abstract

Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-end image/video coding challenge of the IEEE International Conference on Visual Communications and Image Processing (VCIP) in 2022 and 2023. USTC-TD contains 40 images at 4K spatial resolution and 10 video sequences at 1080p spatial resolution, featuring various content due to the diverse environmental factors (e.g. scene type, texture, motion, view) and the designed imaging factors (e.g. illumination, lens, shadow). We quantitatively evaluate USTC-TD on different image/video features (spatial, temporal, color, lightness), and compare it with the previous image/video test datasets, which verifies its excellent compensation for the shortcomings of existing datasets. We also evaluate both classic standardized and recently learned image/video coding schemes on USTC-TD using objective quality metrics (PSNR, MS-SSIM, VMAF) and subjective quality metric (MOS), providing an extensive benchmark for these evaluated schemes. Based on the characteristics and specific design of the proposed test dataset, we analyze the benchmark performance and shed light on the future research and development of image/video coding. All the data are released online: https://esakak.github.io/USTC-TD.

USTC-TD: A Test Dataset and Benchmark for Image and Video Coding in 2020s

TL;DR

USTC-TD introduces a diverse image/video coding test dataset comprising 40 4K images and 10 1080p videos captured with two Nikon cameras to cover a wide range of content factors. The authors provide thorough feature analyses and compare against existing datasets, demonstrating improved coverage of spatial, colorfulness, lightness, and temporal factors. They benchmark both classic standardized codecs and learned compression methods using objective metrics (PSNR, MS-SSIM, VMAF) and subjective MOS, revealing content-dependent advantages for learned models and robust performance for traditional codecs in certain scenarios. The dataset and baselines are released online to enable robust benchmarking, guide standardization efforts, and stimulate further research in compression-aware dataset design and evaluation.

Abstract

Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-end image/video coding challenge of the IEEE International Conference on Visual Communications and Image Processing (VCIP) in 2022 and 2023. USTC-TD contains 40 images at 4K spatial resolution and 10 video sequences at 1080p spatial resolution, featuring various content due to the diverse environmental factors (e.g. scene type, texture, motion, view) and the designed imaging factors (e.g. illumination, lens, shadow). We quantitatively evaluate USTC-TD on different image/video features (spatial, temporal, color, lightness), and compare it with the previous image/video test datasets, which verifies its excellent compensation for the shortcomings of existing datasets. We also evaluate both classic standardized and recently learned image/video coding schemes on USTC-TD using objective quality metrics (PSNR, MS-SSIM, VMAF) and subjective quality metric (MOS), providing an extensive benchmark for these evaluated schemes. Based on the characteristics and specific design of the proposed test dataset, we analyze the benchmark performance and shed light on the future research and development of image/video coding. All the data are released online: https://esakak.github.io/USTC-TD.
Paper Structure (31 sections, 1 equation, 8 figures, 13 tables)

This paper contains 31 sections, 1 equation, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Illustration of the image dataset in USTC-TD 2022 and 2023.
  • Figure 2: Illustration of each video sequence in USTC-TD 2023 video dataset. The 0$\sim$96/0$\sim$300 frames correspond to the short and long setting.
  • Figure 3: The visualization of the evaluation of spatial information (SI) and colorfulness (CF) features on different image test datasets. Scatter diagram represents the SI versus CF, and corresponding convex hulls indicate the coverage of different datasets. The histogram represents the number of images under different SI scores.
  • Figure 6: Overall rate-distortion (RD) curves of advanced image compression schemes on different metrics. From left to right, the results are evaluated by PSNR, MS-SSIM, VMAF (MSE model), and VMAF (MS-SSIM model) metrics on USTC-TD image dataset 2022 and 2023.
  • Figure 7: Overall rate-distortion (RD) curves of advanced video compression schemes on different metrics. From left to right, the results are evaluated by PSNR, MS-SSIM, VMAF (MSE model), and VMAF (MS-SSIM model) metrics on different settings of USTC-TD video dataset.
  • ...and 3 more figures