Table of Contents
Fetching ...

A High-Throughput Hardware Accelerator for Lempel-Ziv 4 Compression Algorithm

Tao Chen, Suwen Song, Zhongfeng Wang

TL;DR

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors and proposes a novel scheme that restricts each parallelization window to a single match, thus elevating the level of actual parallelism.

Abstract

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel designs falls short of the theoretical potential. Secondly, the clock frequency is constrained due to the presence of the feedback loops. To tackle these challenges, we propose a novel scheme that restricts each parallelization window to a single match, thus elevating the level of actual parallelism. Furthermore, by restricting the maximum match length, we eliminate the feedback loops within the architecture, enabling a significant boost in throughput. Finally, we present a high-speed hardware architecture. The implementation results demonstrate that the proposed architecture achieves a throughput of up to 16.10 Gb/s, exhibiting a 2.648x improvement over the start-of-the-art. The new design only results in an acceptable compression ratio reduction ranging from 4.93% to 11.68% with various numbers of hash table entries, compared to the LZ4 compression ratio achieved by official software implementations disclosed on GitHub.

A High-Throughput Hardware Accelerator for Lempel-Ziv 4 Compression Algorithm

TL;DR

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors and proposes a novel scheme that restricts each parallelization window to a single match, thus elevating the level of actual parallelism.

Abstract

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel designs falls short of the theoretical potential. Secondly, the clock frequency is constrained due to the presence of the feedback loops. To tackle these challenges, we propose a novel scheme that restricts each parallelization window to a single match, thus elevating the level of actual parallelism. Furthermore, by restricting the maximum match length, we eliminate the feedback loops within the architecture, enabling a significant boost in throughput. Finally, we present a high-speed hardware architecture. The implementation results demonstrate that the proposed architecture achieves a throughput of up to 16.10 Gb/s, exhibiting a 2.648x improvement over the start-of-the-art. The new design only results in an acceptable compression ratio reduction ranging from 4.93% to 11.68% with various numbers of hash table entries, compared to the LZ4 compression ratio achieved by official software implementations disclosed on GitHub.
Paper Structure (27 sections, 5 figures, 4 tables)

This paper contains 27 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Data format of an LZ4 sequence.
  • Figure 2: The flow chart of LZ4 algorithm.
  • Figure 3: The example of different schemes within current parallelization window.
  • Figure 4: (a) Original vs (b) Modified extended match stage.
  • Figure 5: The detailed architecture of the parallel compression kernel.