Table of Contents
Fetching ...

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

Qiyue Chen, Yao Li, Jie Tao, Song Chen, Li Li, Dong Liu

TL;DR

An efficient pipelined FPGA architecture design for the DV search module is proposed and Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance.

Abstract

Recently, progress has been made on the Intra Pattern Copy (IPC) tool for JPEG XS, an image compression standard designed for low-latency and low-complexity coding. IPC performs wavelet-domain intra compensation predictions to reduce spatial redundancy in screen content. A key module of IPC is the displacement vector (DV) search, which aims to solve the optimal prediction reference offset. However, the DV search process is computationally intensive, posing challenges for practical hardware deployment. In this paper, we propose an efficient pipelined FPGA architecture design for the DV search module to promote the practical deployment of IPC. Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance. Experimental results show that our proposed architecture achieves a throughput of 38.3 Mpixels/s with a power consumption of 277 mW, demonstrating its feasibility for practical hardware implementation in IPC and other predictive coding tools, and providing a promising foundation for ASIC deployment.

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

TL;DR

An efficient pipelined FPGA architecture design for the DV search module is proposed and Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance.

Abstract

Recently, progress has been made on the Intra Pattern Copy (IPC) tool for JPEG XS, an image compression standard designed for low-latency and low-complexity coding. IPC performs wavelet-domain intra compensation predictions to reduce spatial redundancy in screen content. A key module of IPC is the displacement vector (DV) search, which aims to solve the optimal prediction reference offset. However, the DV search process is computationally intensive, posing challenges for practical hardware deployment. In this paper, we propose an efficient pipelined FPGA architecture design for the DV search module to promote the practical deployment of IPC. Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance. Experimental results show that our proposed architecture achieves a throughput of 38.3 Mpixels/s with a power consumption of 277 mW, demonstrating its feasibility for practical hardware implementation in IPC and other predictive coding tools, and providing a promising foundation for ASIC deployment.
Paper Structure (9 sections, 4 figures, 2 tables)

This paper contains 9 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Proposed DV search system architecture. The system is composed of residual calculation and the DV comparison engine. The residual calculation engine retrieves IPC Unit data from DRAM, computes residuals and forwards them to the DV comparison engine, which selects the optimal DV based on the estimated bit plane count of residuals and passes it to the subsequent pattern compensation module.
  • Figure 2: Proposed four-stage DV comparison hardware architecture. The pipeline consists of four processing stages with data processing and register synchronization.
  • Figure 3: (a) Relationship between the IPC Group and IPC Unit, illustrated with a five horizontal, two vertical decomposition. Blue blocks denote IPC unit 0 across different IPC Groups and sub-bands. The notation $Ui\_j$ denotes the $i$th IPC Unit within the $j$th IPC Group. (b) Method 0: Precinct-aligned memory organization. Each precinct has a fixed size of $2560 \times 4$, where 2560 and 4 denote the precinct's width and height, respectively. (c) Method 1: IPC Group-aligned memory organization. The block sizes of IPC groups vary according to a 5-horizontal, 2-vertical decomposition.
  • Figure 4: External memory addressing.