Table of Contents
Fetching ...

Branch Prediction Analysis of Morris-Pratt and Knuth-Morris-Pratt Algorithms

Cyril Nicaud, Carine Pivoteau, Stéphane Vialette

TL;DR

The paper analyzes Morris–Pratt and Knuth–Morris–Pratt pattern matching under a simple local branch-prediction model with 2‑bit saturating counters. It employs automata and transducers to encode letter comparisons and failure functions, and uses Markov chains to characterize both the average number of letter comparisons and the asymptotic misprediction counts. The authors derive explicit expressions for mispredictions due to the counter update, per-letter comparisons, and the $i\ge 0$ test, including closed forms for small patterns and general formulations through stationary distributions. They provide numerical illustrations for small patterns and alphabets, reveal that certain branches (notably $i\ge 0$) can be highly mispredicted at larger alphabets, and discuss extensions to hybrid predictors and Markov-text models. Overall, the work offers a foundational theoretical framework for branch-prediction-aware analysis of classic text-search algorithms and suggests directions for more realistic architectures and probabilistic sources.

Abstract

We analyze the classical Morris-Pratt and Knuth-Morris-Pratt pattern matching algorithms through the lens of computer architecture, investigating the impact of incorporating a simple branch prediction mechanism into the model of computation. Assuming a fixed pattern and a random text, we derive precise estimates of the number of mispredictions these algorithms produce using local predictors. Our approach is based on automata theory and Markov chains, providing a foundation for the theoretical analysis of other text algorithms and more advanced branch prediction strategies.

Branch Prediction Analysis of Morris-Pratt and Knuth-Morris-Pratt Algorithms

TL;DR

The paper analyzes Morris–Pratt and Knuth–Morris–Pratt pattern matching under a simple local branch-prediction model with 2‑bit saturating counters. It employs automata and transducers to encode letter comparisons and failure functions, and uses Markov chains to characterize both the average number of letter comparisons and the asymptotic misprediction counts. The authors derive explicit expressions for mispredictions due to the counter update, per-letter comparisons, and the test, including closed forms for small patterns and general formulations through stationary distributions. They provide numerical illustrations for small patterns and alphabets, reveal that certain branches (notably ) can be highly mispredicted at larger alphabets, and discuss extensions to hybrid predictors and Markov-text models. Overall, the work offers a foundational theoretical framework for branch-prediction-aware analysis of classic text-search algorithms and suggests directions for more realistic architectures and probabilistic sources.

Abstract

We analyze the classical Morris-Pratt and Knuth-Morris-Pratt pattern matching algorithms through the lens of computer architecture, investigating the impact of incorporating a simple branch prediction mechanism into the model of computation. Assuming a fixed pattern and a random text, we derive precise estimates of the number of mispredictions these algorithms produce using local predictors. Our approach is based on automata theory and Markov chains, providing a foundation for the theoretical analysis of other text algorithms and more advanced branch prediction strategies.

Paper Structure

This paper contains 9 sections, 13 theorems, 9 equations, 15 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

The sequence of results of the comparisons $X[i]\neq W[j]$ when applying Algorithm Find to the pattern $X$ and text $W$ is equal to the output of the word $W$ in the transducer ${\mathcal{T}^\textrm{mp}_X}$ for Algorithm MP, and in the transducer ${\mathcal{T}^\textrm{kmp}_X}$ for KMP .

Figures (15)

  • Figure 1: The 2-bit saturated predictor consists of four states: $\underline\nu$ and $\nu$ predict that the branch will not be taken, while $\tau$ and $\underline\tau$ predict that it will. The predictor updates at each condition evaluation, transitioning via $T$ when the branch is taken (i.e., the condition is true) and via $N$ when it is not. Bold edges indicate mispredictions.
  • Figure 2: The deterministic and complete automaton $\mathcal{A}_X$ for $X=ababb$.
  • Figure 3: The automata ${\mathcal{F}^\textrm{mp}_X}$ and ${\mathcal{F}^\textrm{kmp}_X}$ for $X=ababb$, on the same picture; the failure transitions of ${\mathcal{F}^\textrm{mp}_X}$ are in dotted red lines and above, those of ${\mathcal{F}^\textrm{kmp}_X}$ are in dashed blue lines and below. To read the letter $a$ from state $aba$ in ${\mathcal{F}^\textrm{mp}_X}$, one follows the failure transition $aba\rightarrow a$ then $a\rightarrow\varepsilon$ until one can finally read $\varepsilon\xrightarrow{a}a$. In ${\mathcal{F}^\textrm{kmp}_X}$, only one failure transition $aba\rightarrow\varepsilon$ is needed, instead of two.
  • Figure 4: The automata ${\mathcal{F}^\textrm{mp}_X}$ and ${\mathcal{F}^\textrm{kmp}_X}$ transformed into transducers by adding the result of letter comparisons in Find as output of each transition.
  • Figure 5: The transducers ${\mathcal{T}^\textrm{mp}_X}$ and ${\mathcal{T}^\textrm{kmp}_X}$ for $X=ababb$. The only difference between them lies in the transition $aba\xrightarrow{a}a$, for which Algorithm MP uses one more letter comparison.
  • ...and 10 more figures

Theorems & Definitions (13)

  • Lemma 1
  • Lemma 1
  • Proposition 1
  • Proposition 1
  • Proposition 1
  • Lemma 2
  • Proposition 3
  • Proposition 3
  • Lemma 3
  • Proposition 3
  • ...and 3 more