Simple Linear-time Repetition Factorization

Yuki Yonemoto; Shunsuke Inenaga

Simple Linear-time Repetition Factorization

Yuki Yonemoto, Shunsuke Inenaga

TL;DR

This work tackles the problem of computing an arbitrary repetition factorization of a string $w$ of length $n$ in $O(n)$ time and $O(n)$ space, avoiding the Union-Find and interval stabbing structures used in prior approaches. It introduces the ARF-graph, a sparse variant of the repetition graph $G_w$, to compactly encode repetitions with $O(n\log n)$ nodes/edges, enabling linear-time processing. A reconstruction procedure leverages combinatorial reductions to prune redundant edges and nodes, yielding an arbitrary factorization without bit-operations. Building on Inoue et al.'s framework for smallest/largest factorizations, the method provides a simpler, practical linear-time solution for repetition factorizations rooted in runs theory, expanding the toolkit for efficient string factorization.

Abstract

A factorization $f_1, \ldots, f_m$ of a string $w$ of length $n$ is called a repetition factorization of $w$ if $f_i$ is a repetition, i.e., $f_i$ is a form of $x^kx'$, where $x$ is a non-empty string, $x'$ is a (possibly-empty) proper prefix of $x$, and $k \geq 2$. Dumitran et al. [SPIRE 2015] presented an $O(n)$-time and space algorithm for computing an arbitrary repetition factorization of a given string of length $n$. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length $n$ in $O(n)$ time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in $O(n \log n)$ time.

Simple Linear-time Repetition Factorization

TL;DR

This work tackles the problem of computing an arbitrary repetition factorization of a string

of length

time and

space, avoiding the Union-Find and interval stabbing structures used in prior approaches. It introduces the ARF-graph, a sparse variant of the repetition graph

, to compactly encode repetitions with

nodes/edges, enabling linear-time processing. A reconstruction procedure leverages combinatorial reductions to prune redundant edges and nodes, yielding an arbitrary factorization without bit-operations. Building on Inoue et al.'s framework for smallest/largest factorizations, the method provides a simpler, practical linear-time solution for repetition factorizations rooted in runs theory, expanding the toolkit for efficient string factorization.

Abstract

A factorization

of a string

of length

is called a repetition factorization of

is a repetition, i.e.,

is a form of

, where

is a non-empty string,

is a (possibly-empty) proper prefix of

, and

. Dumitran et al. [SPIRE 2015] presented an

-time and space algorithm for computing an arbitrary repetition factorization of a given string of length

. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length

time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in

time.

Paper Structure (5 sections, 1 theorem)

This paper contains 5 sections, 1 theorem.

Introduction
Preliminary
Strings
Repetitive structures
Repetition factorizations

Key Result

theorem thmcountertheorem

Given a string $w$ of length $n$, we can compute an arbitrary repetition factorization of $w$ in $O(n)$ time and $O(n)$ space without bit operations in the word RAM model.

Theorems & Definitions (1)

theorem thmcountertheorem

Simple Linear-time Repetition Factorization

TL;DR

Abstract

Simple Linear-time Repetition Factorization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (1)