Table of Contents
Fetching ...

SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions

Gang Liao, Ye Liu, Yonghua Ding, Le Cai, Jianjun Chen

TL;DR

This work tackles the bottleneck in decoding $LEB128$ varints by introducing SFVInt, a simple, fast, and generic decoder that leverages BMI2 instructions to accelerate bit-level extraction. The method processes both 32- and 64-bit unsigned integers via a unified C++ template, achieving significant speedups with a compact (~500-line) implementation. Core contributions include a BMI2-based bulk decoding approach using $PEXT$ with carefully chosen masks, cross-boundary handling via shift tracking, and case-driven decoding that reduces branches. Extensive evaluation across Intel and AMD architectures shows up to 2x speedups over established libraries like Facebook Folly and Google Protobuf, with performance variations tied to CPU capabilities and varint byte-length distributions, underscoring the practical impact on data-intensive systems.

Abstract

The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance improvement while maintaining simplicity and avoiding overengineering. SFVInt, with its generic design, effectively processes both 32-bit and 64-bit unsigned integers using a unified code template, marking a significant leap forward in varint decoding efficiency. We thoroughly evaluate SFVInt's performance across various datasets and scenarios, demonstrating that it achieves up to a 2x increase in decoding speed when compared to varint decoding methods used in established frameworks like Facebook Folly and Google Protobuf.

SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions

TL;DR

This work tackles the bottleneck in decoding varints by introducing SFVInt, a simple, fast, and generic decoder that leverages BMI2 instructions to accelerate bit-level extraction. The method processes both 32- and 64-bit unsigned integers via a unified C++ template, achieving significant speedups with a compact (~500-line) implementation. Core contributions include a BMI2-based bulk decoding approach using with carefully chosen masks, cross-boundary handling via shift tracking, and case-driven decoding that reduces branches. Extensive evaluation across Intel and AMD architectures shows up to 2x speedups over established libraries like Facebook Folly and Google Protobuf, with performance variations tied to CPU capabilities and varint byte-length distributions, underscoring the practical impact on data-intensive systems.

Abstract

The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance improvement while maintaining simplicity and avoiding overengineering. SFVInt, with its generic design, effectively processes both 32-bit and 64-bit unsigned integers using a unified code template, marking a significant leap forward in varint decoding efficiency. We thoroughly evaluate SFVInt's performance across various datasets and scenarios, demonstrating that it achieves up to a 2x increase in decoding speed when compared to varint decoding methods used in established frameworks like Facebook Folly and Google Protobuf.
Paper Structure (12 sections, 8 figures, 1 table)

This paper contains 12 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: PDEP Example.
  • Figure 2: PEXT Example.
  • Figure 3: Varint Sizing Example
  • Figure 4: BMI2-enhanced Bulk Decoding.
  • Figure 5: Workload 1 (W1): uniform distribution.
  • ...and 3 more figures