A Unifying Perspective on Succinct Data Representations
Benny Kimelfeld, Wim Martens, Matthias Niewerth
TL;DR
We introduce unnamed factorized relations (uFRs) and establish a bijection $\beta(F)$ between uFRs and a class of context-free grammars for uniform-length languages, enabling CFG-style analysis of FRs. The work situates uFRs relative to traditional named FRs, showing potential exponential succinctness gaps and how disjointness constraints can align their sizes. It also connects uFRs to path multiset representations (PMRs) and discusses the tradeoffs with automata, highlighting when PMRs may be more expressive and when uFRs offer succinct finite representations. Finally, the paper discusses extensions to varying-arity relations and outlines a broad research agenda on complexity, complement operations, and practical tradeoffs for succinct data representations.
Abstract
Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of \emph{path multiset representations}, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.
