Table of Contents
Fetching ...

Numbering Combinations for Compact Representation of Many-to-Many Relationship Sets

Savo Tomovic

TL;DR

The paper addresses the challenge of representing many-to-many relationships between a group-entity set $G$ and an item set $I$ in a compact, lossless form by encoding each group of items as a $k$-combination of $n$ elements using the combinatorial number system, yielding a pair $(h,k)$. This enables replacing the traditional bridge table with a single encoded column in $G$ (or a compact bridge $B_{rankc}$), and motivates extensions to relational algebra via Rank-Join and Rank-Inverse-Join to reconstruct the original joins and maintain query expressiveness. Core contributions include the combinatorial bridge table, the two-schema representations $G_{rankc}$ and $B_{rankc}$, and the RankGroup/RankGroupInverse algorithms with complexities $O(k^2)$ and $O(nk)$, respectively, plus a formalized algebraic framework for querying compressed relations. A hospital data-warehouse case study demonstrates dramatic storage reductions (e.g., ~32x for diagnosis groups) while preserving information, underscoring practical impact for data-warehousing and beyond multivalued dimensions.

Abstract

In this paper we propose an approach to implement specific relation-ship set between two entities called combinatorial relationship set. For the combinatorial relationship set B between entity sets G and I the mapping cardinality is many-to-many. Additionally, entities from G can be uniquely encoded with a pair of values (h, k) generated with the procedure for numbering combinations of entities from I. The encoding procedure is based on combinatorial number system that provides a representation of all possible k -combinations of a set of n elements by a single number. In general many-to-many relationship sets are represented by a relation or table, while the combinatorial relationship is not physically stored as separate table. However, all information is encapsulated into a single column added to G. The new column is a candidate key in G. Additional operation named Rank-Join to fundamental relational-algebra is presented to combine information from g and i associated with a combinatorial relationship set. Motivation for combinatorial relationship originates from challenges in designing and implementing multivalued dimensions and bridge tables in data-warehouse models.

Numbering Combinations for Compact Representation of Many-to-Many Relationship Sets

TL;DR

The paper addresses the challenge of representing many-to-many relationships between a group-entity set and an item set in a compact, lossless form by encoding each group of items as a -combination of elements using the combinatorial number system, yielding a pair . This enables replacing the traditional bridge table with a single encoded column in (or a compact bridge ), and motivates extensions to relational algebra via Rank-Join and Rank-Inverse-Join to reconstruct the original joins and maintain query expressiveness. Core contributions include the combinatorial bridge table, the two-schema representations and , and the RankGroup/RankGroupInverse algorithms with complexities and , respectively, plus a formalized algebraic framework for querying compressed relations. A hospital data-warehouse case study demonstrates dramatic storage reductions (e.g., ~32x for diagnosis groups) while preserving information, underscoring practical impact for data-warehousing and beyond multivalued dimensions.

Abstract

In this paper we propose an approach to implement specific relation-ship set between two entities called combinatorial relationship set. For the combinatorial relationship set B between entity sets G and I the mapping cardinality is many-to-many. Additionally, entities from G can be uniquely encoded with a pair of values (h, k) generated with the procedure for numbering combinations of entities from I. The encoding procedure is based on combinatorial number system that provides a representation of all possible k -combinations of a set of n elements by a single number. In general many-to-many relationship sets are represented by a relation or table, while the combinatorial relationship is not physically stored as separate table. However, all information is encapsulated into a single column added to G. The new column is a candidate key in G. Additional operation named Rank-Join to fundamental relational-algebra is presented to combine information from g and i associated with a combinatorial relationship set. Motivation for combinatorial relationship originates from challenges in designing and implementing multivalued dimensions and bridge tables in data-warehouse models.

Paper Structure

This paper contains 8 sections, 9 equations, 10 figures, 1 table, 5 algorithms.

Figures (10)

  • Figure 1: Star schema of a data warehouse
  • Figure 2: Compression of many-to-many relationship set
  • Figure 3: Name-value pair dimension
  • Figure 4: Roles played by the Diagnosis dimension
  • Figure 5: Multivalued diagnosis dimension table
  • ...and 5 more figures