Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence
Sascha Xu, Nils Philipp Walter, Janis Kalofolias, Jilles Vreeken
TL;DR
Syflow addresses the problem of discovering exceptional subgroups by formulating subgroup discovery as a differentiable, KL-divergence maximization task. It integrates flexible target distribution modeling via normalizing flows with a differentiable neuro-symbolic rule learner to produce interpretable subgroups, while encouraging diversity and meaningful size. The approach scales to large datasets and handles complex target distributions, as demonstrated on synthetic data, real-world regression tasks, and a materials-science case study on gold nano-clusters. Overall, Syflow provides a practical, scalable framework for discovering diverse, physically plausible subgroups with human-readable descriptions, advancing descriptive analytics beyond traditional discretization-based methods.
Abstract
Finding and describing sub-populations that are exceptional regarding a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose Syflow, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic and real-world data, including a case study, that Syflow reliably finds highly exceptional subgroups accompanied by insightful descriptions.
