Optimized Memory System Architecture for VESA VDC-M Decoder with Multi-Slice Support
Hannah Yang, Sohyeon Kim, Saeyeon Kim, Jiyoung Lee, Huijin Roh, Ji-Hoon Kim
TL;DR
The paper addresses memory bottlenecks in the VDC-M decoder caused by large on-chip buffers required for multi-slice operation. It proposes three memory system architectures—Baseline, Type 1 (Half-Line Delay with Block Forwarding), and Type 2 (Line Buffer Bank Split)—centered on line-buffer access scheduling and reconstruction-buffer minimization. The key contributions are substantial reductions in on-chip buffers (line buffer by 33.3% and reconstruction buffer by 77.3%), a 31.5% drop in decoder backend gate count, and real-time performance up to 96.45 fps for 4K UHD at 200 MHz with 4 px/cycle throughput, while supporting up to 4 slices per line. The approach yields a practical, hardware-efficient VDC-M decoder implementation compatible with v1.2 and suitable for display-driver IC deployments.
Abstract
Video compression plays a pivotal role in managing and transmitting large-scale display data, particularly given the growing demand for higher resolutions and improved video quality. This paper proposes an optimized memory system architecture for Video Electronics Standards Association (VESA) Display Compression-M (VDC-M) decoder, characterized by its substantial on-chip buffer requirements. We design and analyze three architectures categorized by optimization levels and management complexity. Our strategy focuses on enhancing line buffer access scheduling and minimizing reconstruction buffer, targeting prediction and multi-slice operation that are the major resource consumers in the decoder. By adjusting line delay and segmenting SRAM bank alongside reconstructed block forwarding, we achieve a 33.3% size reduction in the line buffer and 77.3% in the reconstruction buffer compared to Baseline VDC-M decoder. Synthesized using a 28 nm CMOS process, the proposed architecture achieves a 31.5% reduction in gate count of the decoder backend hardware, supporting real-time performance with up to 96.45 fps for 4K UHD resolution at 200 MHz operating frequency and a throughput of 4 pixels per cycle.
