Learning to Understand: Identifying Interactions via the Möbius Transform
Justin S. Kang, Yigit E. Erginbas, Landon Butler, Ramtin Pedarsani, Kannan Ramchandran
TL;DR
This work proposes Sparse Möbius Transform (SMT) to identify high-order interactions in functions by exploiting sparsity in Möbius coefficients. By combining subsampling (aliasing) with non-adaptive group testing and a peeling message-passing algorithm, SMT achieves exact Möbius reconstruction with near-linear sample complexity $O(Kn)$ and near-quadratic time $O(Kn^2)$ under uniform-interaction assumptions, and $O(Kt\log n)$ samples with $t$-degree interactions in the low-degree setting, even in the presence of noise. The approach yields more faithful explanations than Shapley or Banzhaf values on several real-model tasks (e.g., breast cancer, sentiment analysis, and QA) given the same number of terms, highlighting the practical impact for model interpretability and data valuation. These results integrate ideas from sparse signal processing, coding theory, and group testing to deliver a scalable, non-adaptive framework for uncovering meaningful input interactions in complex models.
Abstract
One of the key challenges in machine learning is to find interpretable representations of learned functions. The Möbius transform is essential for this purpose, as its coefficients correspond to unique importance scores for sets of input variables. This transform is closely related to widely used game-theoretic notions of importance like the Shapley and Bhanzaf value, but it also captures crucial higher-order interactions. Although computing the obius Transform of a function with $n$ inputs involves $2^n$ coefficients, it becomes tractable when the function is sparse and of low-degree as we show is the case for many real-world functions. Under these conditions, the complexity of the transform computation is significantly reduced. When there are $K$ non-zero coefficients, our algorithm recovers the Möbius transform in $O(Kn)$ samples and $O(Kn^2)$ time asymptotically under certain assumptions, the first non-adaptive algorithm to do so. We also uncover a surprising connection between group testing and the Möbius transform. For functions where all interactions involve at most $t$ inputs, we use group testing results to compute the Möbius transform with $O(Kt\log n)$ sample complexity and $O(K\mathrm{poly}(n))$ time. A robust version of this algorithm withstands noise and maintains this complexity. This marks the first $n$ sub-linear query complexity, noise-tolerant algorithm for the Möbius transform. In several examples, we observe that representations generated via sparse Möbius transform are up to twice as faithful to the original function, as compared to Shaply and Banzhaf values, while using the same number of terms.
