Table of Contents
Fetching ...

A fast implementation of the good-suffix array for the Boyer-Moore string matching algorithm

Thierry Lecroq

TL;DR

This work presents a fast implementation for this good-suffix table based on a tight analysis of the pattern and shows two versions of this new implementation are the fastest in almost all tested situations.

Abstract

String matching is the problem of finding all the occurrences of a pattern in a text. It has been intensively studied and the Boyer-Moore string matching algorithm is probably one of the most famous solution to this problem. This algorithm uses two precomputed shift tables called the good-suffix table and the bad-character table. The good-suffix table is tricky to compute in linear time. Text book solutions perform redundant operations. Here we present a fast implementation for this good-suffix table based on a tight analysis of the pattern. Experimental results show two versions of this new implementation are the fastest in almost all tested situations.

A fast implementation of the good-suffix array for the Boyer-Moore string matching algorithm

TL;DR

This work presents a fast implementation for this good-suffix table based on a tight analysis of the pattern and shows two versions of this new implementation are the fastest in almost all tested situations.

Abstract

String matching is the problem of finding all the occurrences of a pattern in a text. It has been intensively studied and the Boyer-Moore string matching algorithm is probably one of the most famous solution to this problem. This algorithm uses two precomputed shift tables called the good-suffix table and the bad-character table. The good-suffix table is tricky to compute in linear time. Text book solutions perform redundant operations. Here we present a fast implementation for this good-suffix table based on a tight analysis of the pattern. Experimental results show two versions of this new implementation are the fastest in almost all tested situations.
Paper Structure (8 sections, 4 theorems, 6 equations, 19 figures, 4 tables)

This paper contains 8 sections, 4 theorems, 6 equations, 19 figures, 4 tables.

Key Result

lemma 1

Let $x=x_1b_1a^{k_1}$ for $x_1\in \Sigma^*$, $a,b_1\in \Sigma$, $b_1\ne a$ and an integer $k_1 \ge 1$. Let $\ell_1=|x_1b_1|$ and $r_1=|x_1b_1a^{k_1}|-2=m-2$, then for $\ell_1 \le i \le r_1$.

Figures (19)

  • Figure 1: Suffix $z = x[i+1\mathinner{\ldotp\ldotp} m-1]$ of $x$ is equal to factor $z = y[j-m+i+2\mathinner{\ldotp\ldotp} j]$ of $y$ and symbol $a = x[i]$ is different from symbol $b = y[j-m+i+1]$.
  • Figure 2: $\textit{good-suff}[i] = \min\{m-1-j \mid m-1 - \textit{suff}[j] = i\}$.
  • Figure 3: $\textit{good-suff}[i] = m-j-1$.
  • Figure 4: Suffixesx,m returns the table $\textit{suff}$ of string $x$ of length $m$.
  • Figure 5: Good-Suffixesx,m,suff returns the table $\textit{good-suff}$ of string $x$ of length $m$ given its table $\textit{suff}$.
  • ...and 14 more figures

Theorems & Definitions (8)

  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • proof
  • lemma 4
  • proof