Table of Contents
Fetching ...

Bayesian Inference of Minimally Complex Models with Interactions of Arbitrary Order

Clélia de Mulatier, Matteo Marsili

TL;DR

The paper tackles learning high-order dependencies in high-dimensional binary data by introducing Minimally Complex Models (MCMs), a broad family of maximum-entropy spin models composed of independent complete components. By combining Bayesian model selection with the MDL framework and exploiting gauge invariance, the authors derive efficient, representation-agnostic methods to compute model evidence and perform sampling, enabling practical exploration of models with arbitrary interaction orders. They propose a two-step search—first identifying an optimal independent-basis (IM) and then optimally partitioning into ICCs—to identify the best MCM, with exhaustive search feasible for small systems and scalable heuristics for larger ones. Applications to real data (US Supreme Court voting, Big Five personality tests, and MNIST) demonstrate that MCMs can capture meaningful high-order dependencies while remaining interpretable and compressive, often outperforming traditional pairwise models. The work also provides a binary-linear-algebra framework and open-source tools to implement GTs, basis transformations, and MCM sampling, suggesting broad utility for fast, principled structure discovery in complex systems.

Abstract

Finding the model that best describes a high-dimensional dataset is a daunting task, even more so if one aims to consider all possible high-order patterns of the data, going beyond pairwise models. For binary data, we show that this task becomes feasible when restricting the search to a family of simple models, that we call Minimally Complex Models (MCMs). MCMs are maximum entropy models that have interactions of arbitrarily high order grouped into independent components of minimal complexity. They are simple in information-theoretic terms, which means they can only fit well certain types of data patterns and are therefore easy to falsify. We show that Bayesian model selection restricted to these models is computationally feasible and has many advantages. First, the model evidence, which balances goodness-of-fit against complexity, can be computed efficiently without any parameter fitting, enabling very fast explorations of the space of MCMs. Second, the family of MCMs is invariant under gauge transformations, which can be used to develop a representation-independent approach to statistical modeling. For small systems (up to 15 variables), combining these two results allows us to select the best MCM among all, even though the number of models is already extremely large. For larger systems, we propose simple heuristics to find optimal MCMs in reasonable times. Besides, inference and sampling can be performed without any computational effort. Finally, because MCMs have interactions of any order, they can reveal the presence of important high-order dependencies in the data, providing a new approach to explore high-order dependencies in complex systems. We apply our method to synthetic data and real-world examples, illustrating how MCMs portray the structure of dependencies among variables in a simple manner, extracting falsifiable predictions on symmetries and invariance from the data.

Bayesian Inference of Minimally Complex Models with Interactions of Arbitrary Order

TL;DR

The paper tackles learning high-order dependencies in high-dimensional binary data by introducing Minimally Complex Models (MCMs), a broad family of maximum-entropy spin models composed of independent complete components. By combining Bayesian model selection with the MDL framework and exploiting gauge invariance, the authors derive efficient, representation-agnostic methods to compute model evidence and perform sampling, enabling practical exploration of models with arbitrary interaction orders. They propose a two-step search—first identifying an optimal independent-basis (IM) and then optimally partitioning into ICCs—to identify the best MCM, with exhaustive search feasible for small systems and scalable heuristics for larger ones. Applications to real data (US Supreme Court voting, Big Five personality tests, and MNIST) demonstrate that MCMs can capture meaningful high-order dependencies while remaining interpretable and compressive, often outperforming traditional pairwise models. The work also provides a binary-linear-algebra framework and open-source tools to implement GTs, basis transformations, and MCM sampling, suggesting broad utility for fast, principled structure discovery in complex systems.

Abstract

Finding the model that best describes a high-dimensional dataset is a daunting task, even more so if one aims to consider all possible high-order patterns of the data, going beyond pairwise models. For binary data, we show that this task becomes feasible when restricting the search to a family of simple models, that we call Minimally Complex Models (MCMs). MCMs are maximum entropy models that have interactions of arbitrarily high order grouped into independent components of minimal complexity. They are simple in information-theoretic terms, which means they can only fit well certain types of data patterns and are therefore easy to falsify. We show that Bayesian model selection restricted to these models is computationally feasible and has many advantages. First, the model evidence, which balances goodness-of-fit against complexity, can be computed efficiently without any parameter fitting, enabling very fast explorations of the space of MCMs. Second, the family of MCMs is invariant under gauge transformations, which can be used to develop a representation-independent approach to statistical modeling. For small systems (up to 15 variables), combining these two results allows us to select the best MCM among all, even though the number of models is already extremely large. For larger systems, we propose simple heuristics to find optimal MCMs in reasonable times. Besides, inference and sampling can be performed without any computational effort. Finally, because MCMs have interactions of any order, they can reveal the presence of important high-order dependencies in the data, providing a new approach to explore high-order dependencies in complex systems. We apply our method to synthetic data and real-world examples, illustrating how MCMs portray the structure of dependencies among variables in a simple manner, extracting falsifiable predictions on symmetries and invariance from the data.

Paper Structure

This paper contains 26 sections, 46 equations, 13 figures.

Figures (13)

  • Figure 1: (colors online) Examples of Minimally Complex Models for a $6$-spin system. Models are represented by diagrams: single-spin variables are dots, full in the presence of a local field and empty otherwise; pairwise interactions are blue lines; 3-spin interactions are orange triangles; and 4-spin interactions are light blue polygons enclosing four spins. Note that in this figure we show special examples of MCMs that correspond to partitions of the variables into independent parts, with the variables being completely connected within each part and not connected at all between the parts. Models of this type form the category MCM$^{*}$ in Fig. \ref{['Fig2:NbModels']}.
  • Figure 2: (colors online) Number of spin models as a function of the system size $n$ for different families (see App. \ref{['app:enum']} for proofs of the counts): all spin models (green), all Minimally Complex Models (MCM, violet), all Independent Models (IM, orange), and all Sub-Complete Models (SCM, dark blue). MCM$^*$ and IM$^*$ indicate respectively the number of MCM and IM that share the same preferred basis. For comparison, we also report the number of Pairwise Models (PM$^*$). The number of IM and of MCM grows exponentially with $n$, roughly as $2^{n^2}$, whereas the number of PM$^*$ grows as $2^{n^{2}/2}$. The number of MCM$^*$ grows slower than $n^n$. Note that the $y$-axis reports the logarithm base 10 of the number of models. For example, at $n=9$, there are of the order of $10^{153}$ models, but only $10^{20}$ MCM, which include $10^{18}$ IM and $10^5$ MCM$^*$; there are $10^{13}$ PM$^*$.
  • Figure 3: (colors online) Examples of MCMs and basis transformations. All these models are MCMs and are represented with the same notations as in Fig. \ref{['Fig1:MCM_ex']}. In the first column, the first model is an independent model (IM) composed of two independent operators, the second is a sub-complete model (SCM) composed of a single independent complete component (ICC), and the last one is an MCM composed of two ICCs. In each column, models of each row are obtained by the same basis transformation $\mathcal{T}_i$ of the models of the first column, and therefore have the same properties as their respective original model (i.e., they are respectively IM, SCM, and MCM with two ICCs -- see Fig. \ref{['fig:SI:MCM_GT']} in Appendix for details on the transformations $\mathcal{T}_i$ used). An alternative way of thinking about this is that each row displays the same abstract spin model represented in different bases.
  • Figure 4: Analysis of the US Supreme Court Data. Justices are represented by circles labeled by their initials: Ruth Bader Ginsburg (RG), John P. Stevens (JS), David Souter (DS), Stephen Breyer (SB), Sandra Day O'Connor (SO), William Rehnquist (WR), Anthony Kennedy (AK), Clarence Thomas (CT), Antonin Scalia (AS). The colors represent their political orientation taken from Ref. lee2015statistical: dark red indicates the most conservative-oriented justices and dark blue the most liberal-oriented justices. a) Best MCM, represented in the original basis variables (i.e., the justices' votes). The best IM is composed of $8$ pairwise interactions, represented by links between nodes with width proportional to their respective strength, and $1$ single-body interaction represented by a square on CT. The strongest interaction has $\langle s_{\rm CT}s_{\rm AS}\rangle\simeq 0.86$, whereas the weakest has $\langle s_{\rm CT}\rangle\simeq -0.45$. The three large circles identify the partition of these interactions into the ICCs of the best MCM (among all). The red and blue dotted squares indicate the partition of the judges that corresponds to the best MCM among those that have the original basis of the data as a preferred basis. b) Factor graph representation of the best MCM. Spin variables $\boldsymbol{s}$ are represented by circles. The model selection procedure first identifies the best basis $\boldsymbol{b}^*$, whose independent operators are denoted by squares, and then the best clustering of these operators into ICCs $\mathcal{M}_a$, denoted by triangles. Each square in the second layer corresponds to one of the interactions of the best IM represented in Panel a, numbered from the strongest interaction (1) to the weakest (9).
  • Figure 5: Best MCM found for the Big Five Personality Test dataset. The numbered circles are the 50 questions ordered as in the original dataset of Ref. Big5data, in which questions were grouped by traits. This is not a factor graph, as the best MCM found in the best basis happens to be the same model as the one displayed here in the original basis (see Fig. \ref{['fig:big5:Best_MCM']} for the factor graph representation in the best basis). According to this result, the statement 30, " I make people feel at ease", is better associated to Extraversion than to Agreeableness.
  • ...and 8 more figures