Table of Contents
Fetching ...

Fast and Lightweight Distributed Suffix Array Construction -- First Results

Manuel Haag, Florian Kurpicz, Peter Sanders, Matthias Schimek

TL;DR

The authors address scalable suffix array construction in distributed memory by adapting the DCX/DC3 framework into a space-efficient, bucketing-based algorithm. They introduce a randomized chunk redistribution technique to achieve provable load balancing and reduce in-memory footprint, while maintaining competitive runtime relative to PSAC on large datasets. Preliminary MPI-based evaluations show memory advantages and robust performance on Common Crawl and Wikipedia data, with some trade-offs on DNA data, and they outline extensions to external memory and multi-GPU environments. The work demonstrates a practical path to fast, low-memory distributed suffix sorting with broad applicability to large-scale text processing.

Abstract

We present first algorithmic ideas for a practical and lightweight adaption of the DCX suffix array construction algorithm [Sanders et al., 2003] to the distributed-memory setting. Our approach relies on a bucketing technique which enables a lightweight implementation which uses less than half of the memory required by the currently fastest distributed-memory suffix array algorithm PSAC [Flick and Aluru, 2015] while being competitive or even faster in terms of running time.

Fast and Lightweight Distributed Suffix Array Construction -- First Results

TL;DR

The authors address scalable suffix array construction in distributed memory by adapting the DCX/DC3 framework into a space-efficient, bucketing-based algorithm. They introduce a randomized chunk redistribution technique to achieve provable load balancing and reduce in-memory footprint, while maintaining competitive runtime relative to PSAC on large datasets. Preliminary MPI-based evaluations show memory advantages and robust performance on Common Crawl and Wikipedia data, with some trade-offs on DNA data, and they outline extensions to external memory and multi-GPU environments. The work demonstrates a practical path to fast, low-memory distributed suffix sorting with broad applicability to large-scale text processing.

Abstract

We present first algorithmic ideas for a practical and lightweight adaption of the DCX suffix array construction algorithm [Sanders et al., 2003] to the distributed-memory setting. Our approach relies on a bucketing technique which enables a lightweight implementation which uses less than half of the memory required by the currently fastest distributed-memory suffix array algorithm PSAC [Flick and Aluru, 2015] while being competitive or even faster in terms of running time.

Paper Structure

This paper contains 13 sections, 1 theorem, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

When redistributing chunks of size $c$ uniformly at random across $p$ PEs, with $q$ buckets each containing $n/q$ elements, the expected number of elements from a single bucket received by a PE is $n/(pq)$. Furthermore, the probability that any PE receives $2n/(pq)$ or more elements from the same bu

Figures (2)

  • Figure 1: Timeline of sequential suffix array construction with algorithms that share techniques are marked with an arrow. Figure based on DBLP:journals/csur/PuglisiST07DBLP:phd/dnb/Bingmann18DBLP:phd/dnb/Kurpicz20. The three techniques are shown as columns and algorithms that combine multiple techniques are crossing the borders. Suffix array construction algorithms with linear running time are highlighted in dark gray. If an implementation is publicly available, the algorithm is also marked in brown.
  • Figure 2: Running times and blow-up of the SACAs in our weak scaling experiments with 20MB per PE.

Theorems & Definitions (1)

  • Theorem 1: Random Chunk Redistribution