Aggregating time-series and image data: functors and double functors
Joscha Diehl
TL;DR
The paper develops a unified, category-theoretic view of data aggregation for time-series and images by modeling aggregations as functors on a category of intervals and extending to double functors on a double category of rectangles. It leverages the freeness of these categories to guarantee universal constructions and to enable unique liftings of local data to global aggregates, while showing that parallelizable implementations follow from Blelloch’s prefix-scan. Key contributions include formalizing 1D aggregations as functors with a wide range of targets, introducing the free double category of rectangles for 2D data, and detailing row-wise then column-wise parallel scans for 2D data with explicit complexity. The work provides a principled, scalable framework for implementing both time-series and image data aggregations in parallel, and points to rich future directions involving irregular geometries and non-abelian categorical structures to capture more complex data domains.
Abstract
Aggregation of time-series or image data over subsets of the domain is a fundamental task in data science. We show that many known aggregation operations can be interpreted as (double) functors on appropriate (double) categories. Such functorial aggregations are amenable to parallel implementation via straightforward extensions of Blelloch's parallel scan algorithm. In addition to providing a unified viewpoint on existing operations, it allows us to propose new aggregation operations for time-series and image data.
