Numbering Combinations for Compact Representation of Many-to-Many Relationship Sets
Savo Tomovic
TL;DR
The paper addresses the challenge of representing many-to-many relationships between a group-entity set $G$ and an item set $I$ in a compact, lossless form by encoding each group of items as a $k$-combination of $n$ elements using the combinatorial number system, yielding a pair $(h,k)$. This enables replacing the traditional bridge table with a single encoded column in $G$ (or a compact bridge $B_{rankc}$), and motivates extensions to relational algebra via Rank-Join and Rank-Inverse-Join to reconstruct the original joins and maintain query expressiveness. Core contributions include the combinatorial bridge table, the two-schema representations $G_{rankc}$ and $B_{rankc}$, and the RankGroup/RankGroupInverse algorithms with complexities $O(k^2)$ and $O(nk)$, respectively, plus a formalized algebraic framework for querying compressed relations. A hospital data-warehouse case study demonstrates dramatic storage reductions (e.g., ~32x for diagnosis groups) while preserving information, underscoring practical impact for data-warehousing and beyond multivalued dimensions.
Abstract
In this paper we propose an approach to implement specific relation-ship set between two entities called combinatorial relationship set. For the combinatorial relationship set B between entity sets G and I the mapping cardinality is many-to-many. Additionally, entities from G can be uniquely encoded with a pair of values (h, k) generated with the procedure for numbering combinations of entities from I. The encoding procedure is based on combinatorial number system that provides a representation of all possible k -combinations of a set of n elements by a single number. In general many-to-many relationship sets are represented by a relation or table, while the combinatorial relationship is not physically stored as separate table. However, all information is encapsulated into a single column added to G. The new column is a candidate key in G. Additional operation named Rank-Join to fundamental relational-algebra is presented to combine information from g and i associated with a combinatorial relationship set. Motivation for combinatorial relationship originates from challenges in designing and implementing multivalued dimensions and bridge tables in data-warehouse models.
