Walking Your Frog Fast in 4 LoC

Nis Meinert

Walking Your Frog Fast in 4 LoC

Nis Meinert

TL;DR

The paper tackles the computational challenge of the discrete Fréchet distance between polygonal curves, which traditionally requires a recurrence with $O(PQ)$ memory. It introduces a recursion-free, iterative algorithm (the Fast Fréchet Distance) that computes the distance with linear memory $O(Q)$ and a concise, vectorizable implementation. The core contribution is Alg. fast, built with fold/reduce and scan/accumulate operations, and its capacity to run efficiently on SIMD and GPGPU architectures; the authors provide empirical benchmarks showing substantial CPU speedups ($\sim$19–$22\times$) and competitive GPU performance. The work enables scalable, exact distance computations for large trajectory datasets and batch processing, with open-source reproduce-and-extend potential for applications in clustering, movement analytics, and uncertain-curve reasoning.

Abstract

Given two polygonal curves, there are many ways to define a notion of similarity between them. One popular measure is the Fréchet distance which has many desirable properties but is notoriously expensive to calculate, especially for non-trivial metrics. In 1994, Eiter and Mannila introduced the discrete Fréchet distance which is much easier to implement and approximates the continuous Fréchet distance with a quadratic runtime overhead. However, this algorithm relies on recursions and is not well suited for modern hardware. To that end, we introduce the Fast Fréchet Distance algorithm, a recursion-free algorithm that calculates the discrete Fréchet distance with a linear memory overhead and that can utilize modern hardware more effectively. We showcase an implementation with only four lines of code and present benchmarks of our algorithm running fast on modern CPUs and GPGPUs.

Walking Your Frog Fast in 4 LoC

TL;DR

The paper tackles the computational challenge of the discrete Fréchet distance between polygonal curves, which traditionally requires a recurrence with

memory. It introduces a recursion-free, iterative algorithm (the Fast Fréchet Distance) that computes the distance with linear memory

and a concise, vectorizable implementation. The core contribution is Alg. fast, built with fold/reduce and scan/accumulate operations, and its capacity to run efficiently on SIMD and GPGPU architectures; the authors provide empirical benchmarks showing substantial CPU speedups (

19–

) and competitive GPU performance. The work enables scalable, exact distance computations for large trajectory datasets and batch processing, with open-source reproduce-and-extend potential for applications in clustering, movement analytics, and uncertain-curve reasoning.

Abstract

Paper Structure (9 sections, 1 theorem, 3 equations, 2 figures, 9 algorithms)

This paper contains 9 sections, 1 theorem, 3 equations, 2 figures, 9 algorithms.

Introduction
The Fréchet Distance
An Iterative, Linear-Memory Algorithm
Parallel Implementations
Summary
Reproducibility
Scan and Fold
DTW and the Levenshtein Distance
Additional Benchmarking Results

Key Result

Theorem 1

Alg. alg:fast calculates the discrete Fréchet distance given a distance matrix $d \in \mathbb{R}^{P \times Q}$. The algorithm iteratively consumes the rows of $d$ such that each row, or even each element, can be computed lazily; the memory requirement is therefore reduced to $\mathcal{O}(Q)$. The co

Figures (2)

Figure 1: Comparison of four different implementations of the Fast Fréchet Distance algorithm on a laptop (CPU: i7-11800H, GPU: NVIDIA GeForce RTX 3080 Mobile (16 GB VRAM), CUDA Version: 12.2) using 32-bit floating point numbers. Vanilla and Linear refer to Alg. \ref{['alg:no_recursion']} and \ref{['alg:linear']}, respectively. The SIMD implementation uses a batch size of $B = 32$ (twice the size of a 512-bit register in order to improve the instruction level parallelism) and relies on the AVX-512 instruction set; the baseline implementation uses the same technique to calculate $\sum_{ij} d_{ij}$. The CUDA implementation uses a grid and block size of 128 and 64, respectively, that was measured to perform best for $N=2^{13}$ and $P=2^{10}$. All variants utilize only a single CPU core.
Figure 2: Comparison of four different implementations of the Fast Fréchet Distance algorithm on a laptop (CPU: AMD Ryzen Threadripper 3960X, GPU: NVIDIA GeForce RTX 3090 (24 GB VRAM), CUDA Version: 12.2) using 32-bit floating point numbers. Vanilla and Linear refer to Alg. \ref{['alg:no_recursion']} and \ref{['alg:linear']}, respectively. The SIMD implementation uses a batch size of $B = 16$ (twice the size of a 256-bit register in order to improve the instruction level parallelism) and relies on the AVX2 instruction set; the baseline implementation uses the same technique to calculate $\sum_{ij} d_{ij}$. The CUDA implementation uses a grid and block size of 128 and 64, respectively, that was measured to perform best for $N=2^{13}$ and $P=2^{10}$. All variants utilize only a single CPU core.

Theorems & Definitions (2)

Theorem 1: Fast Discrete Fréchet Distance Algorithm
proof : Proof of Theorem \ref{['thm:fast']}

Walking Your Frog Fast in 4 LoC

TL;DR

Abstract

Walking Your Frog Fast in 4 LoC

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)