SCHENO: Measuring Schema vs. Noise in Graphs
Justus Isaiah Hibshman, Adnan Hoq, Tim Weninger
TL;DR
SCHENO introduces a principled, goal-agnostic metric for decomposing graphs into a schema (pattern) and noise, balancing symmetry-driven structure with random chaos. It formalizes a two-stage generative process in which a schema graph is drawn from a symmetry-rich distribution and noise is added via an Erdős-Rényi-like process, with a log-score guiding optimization. The authors derive a principled method to set the noise probability p, compare SCHENO against several graph-mining models, and demonstrate that SCHENO-guided decompositions can uncover diverse, meaningful patterns across synthetic and real-world data. This framework provides a general tool for pattern discovery in graphs and suggests directions for more powerful, scalable algorithms beyond traditional tasks.
Abstract
Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.
