ChARLES: Change-Aware Recovery of Latent Evolution Semantics in Relational Data
Shiyi He, Alexandra Meliou, Anna Fariha
TL;DR
ChARLES addresses the challenge of interpreting data evolution by generating semantic summaries of changes between two relational database snapshots. It represents changes as conditional transformations within a partitioned view of the data and produces a linear-model-tree where each path defines a partition and its associated transformation. The method optimizes a scores that balance accuracy and interpretability, Score(S) = α × Accuracy(S) + (1−α) × Interpretability(S), with Accuracy defined as the inverse $L_1$ distance between the transformed source and the target, and Interpretability guided by concise, high-coverage conditions and simpler transformations. A setup assistant and a diff discovery engine coordinate partitioning and transformation discovery under user-tunable parameters, and a PoC demonstration on real datasets shows how latent evolution semantics can be recovered and interpreted to support data-driven decision making.
Abstract
Data-driven decision-making is at the core of many modern applications, and understanding the data is critical in supporting trust in these decisions. However, data is dynamic and evolving, just like the real-world entities it represents. Thus, an important component of understanding data is analyzing and drawing insights from the changes it undergoes. Existing methods for exploring data change list differences exhaustively, which are not interpretable by humans and lack salient insights regarding change trends. For example, an explanation that semantically summarizes changes to highlight gender disparities in performance rewards is more human-consumable than a long list of employee salary changes. We demonstrate ChARLES, a system that derives semantic summaries of changes between two snapshots of an evolving database, in an effective, concise, and interpretable way. Our key observation is that, while datasets often evolve through point and other small-batch updates, rich data features can reveal latent semantics that can intuitively summarize the changes. Under the hood, ChARLES compares database versions, infers feasible transformations by fitting multiple regression lines over different data partitions to derive change summaries, and ranks them. ChARLES allows users to customize it to obtain their preferred explanation by navigating the accuracy-interpretability tradeoff, and offers a proof of concept for reasoning about data evolution over real-world datasets.
