Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

Zhenyao Ma; Yue Liang; Dongxu Li

Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

Zhenyao Ma, Yue Liang, Dongxu Li

TL;DR

Theoretically, the universal approximation property of BL is established, and the M-estimation properties of IBL are analyzed, which demonstrates strong predictive performance, intrinsic interpretability and scalability to high-dimensional data.

Abstract

Inspired by behavioral science, we propose Behavior Learning (BL), a novel general-purpose machine learning framework that learns interpretable and identifiable optimization structures from data, ranging from single optimization problems to hierarchical compositions. It unifies predictive performance, intrinsic interpretability, and identifiability, with broad applicability to scientific domains involving optimization. BL parameterizes a compositional utility function built from intrinsically interpretable modular blocks, which induces a data distribution for prediction and generation. Each block represents and can be written in symbolic form as a utility maximization problem (UMP), a foundational paradigm in behavioral science and a universal framework of optimization. BL supports architectures ranging from a single UMP to hierarchical compositions, the latter modeling hierarchical optimization structures. Its smooth and monotone variant (IBL) guarantees identifiability. Theoretically, we establish the universal approximation property of BL, and analyze the M-estimation properties of IBL. Empirically, BL demonstrates strong predictive performance, intrinsic interpretability and scalability to high-dimensional data. Code: https://github.com/MoonYLiang/Behavior-Learning ; install via pip install blnetwork.

Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

TL;DR

Abstract

Paper Structure (97 sections, 19 theorems, 162 equations, 6 figures, 15 tables)

This paper contains 97 sections, 19 theorems, 162 equations, 6 figures, 15 tables.

Introduction
Behavior Learning (BL)
Utility Maximization Problem (UMP)
BL Architecture
Model Structure of $\mathrm{BL}(\mathbf{x}, \mathbf{y})$.
Learning Objective.
Implementation Details.
Theoretical Guarantees.
Interpretability.
Identifiable Behavior Learning (IBL)
Theoretical Foundation.
Experiments
Standard Prediction Tasks
Predictive Performance.
Interpreting BL: A Case Study
...and 82 more sections

Key Result

Theorem 2.1

Let $\mathcal{X}\subset\mathbb{R}^{d_x}$ and $\mathcal{Y}\subset\mathbb{R}^{d_y}$ be nonempty compact sets, and let $U:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$, $\mathcal{C}:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}^m$, and $\mathcal{T}:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}^p$ be $C^1$. Assume Here $\phi:\mathbb{R}\to\mathbb{R}$ is strictly increasing and $C^1$, $\rho(z):=\max\{z,0\}$, and $

Figures (6)

Figure 2: (a) Visualization and symbolic form of BL(Single) trained on the Boston Housing dataset, modeling the UMP ($\max U \;\; \text{s.t.}\; \mathcal{C}\leq 0,\; \mathcal{T}=0$) of a representative buyer in Boston housing (details in Section \ref{['expr:case-study']}). Top: computational graphs of the polynomials inside the three penalty functions—$\tanh$ (preference), $\mathrm{ReLU}$ (budget), and $|\cdot|$ (belief). Each graph is respectively centered on $\tanh^{-1}(U)$, $\mathcal{C}$, and $\mathcal{T}$ from left to right, with surrounding nodes representing input features. Directed edges (shown only if coefficient $\geq 0.3$) indicate how each feature contributes to the corresponding term. Bottom: approximate symbolic formulation of the trained BL model as a UMP. (b) The BL[2,1] architecture. Layer 1 identifies two key micro-level preference types: the Economic-sensitive Buyer and the Location-sensitive Buyer. Layer 2 aggregates these two components into an effective representative buyer. (c) The BL(Deep) [5,3,1] architecture. Layer 1 recovers five distinct micro-level housing preference types. Layer 2 identifies three macro-level trade-off types capturing different ways these primitive preferences interact. Layer 3 aggregates them into the overall representative buyer. Table \ref{['app:senmicofblock']} provides detailed descriptions of each type. BL(Deep) provides a hierarchical explanation consistent with the coarse-graining principle kadanoff1966scaling in statistical physics, reconstructing the full micro-to-macro optimization hierarchy. In addition, the preference and trade-off patterns uncovered by BL(Deep) are well documented in the classical economics literature (see Table \ref{['app:sci-recove']}). (d) BL can be applied to a broad class of hierarchical optimization structures in science, including hierarchical need structures, hierarchical social–organizational structures, and renormalization-style coarse-grained structures in physics.
Figure 3: Predictive performance of BL and baselines. Left/Middle: relative AUC and F1-Macro gains over DT, sorted by mean (excluding BL). Right: mean F1-Macro ranks (↓ better). BL achieves first-tier performance in both metrics. Its variants rank second and third in mean F1-Macro rank, with BL(Shallow) showing no statistically significant difference from state-of-the-art models.
Figure 4: Interpreting deeper BL architectures as hierarchical structures of interacting agents. Each block $\mathcal{B}$ represents an interpretable agent solving its own UMP, while a layer corresponds to a set of heterogeneous agents operating in parallel. The next layer then aggregates and reallocates the negative energies from the previous layer, thereby performing higher-level coordination across agents. This layered organization provides a natural compositional interpretation of deep BL: bottom-layer modules encode local objectives, while upper layers synthesize these into collective outcomes. Analogous structures arise in biological and social systems—for example, in ant colonies, individual ants (first-layer agents) follow simple local rules, yet their collective behavior is coordinated through higher-level interactions (second-layer aggregation), yielding globally efficient resource allocation and task division.
Figure 5: Comparison of BL and E-MLP on image and text datasets; $d$ denotes model depth.
Figure 6: Constraint enforcement test of the BL penalty block on an energy-conservation constraint. The figure reports violation statistics $|T(x,y)|$ when varying the temperature $\tau$ (left side of panel) and the penalty weight $\lambda$ (right side of panel).
...and 1 more figures

Theorems & Definitions (36)

Theorem 2.1: Local Exact Penalty Reformulation for UMP
Theorem 2.2: Universality of UMP
Theorem 2.3: Universal Approximation of BL
Theorem 2.4: Identifiability of IBL
Theorem 2.5: Loss Identifiability of IBL
Theorem 2.6: Consistency of IBL
Theorem 2.7: Universal Consistency of IBL
proof
proof
proof
...and 26 more

Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

TL;DR

Abstract

Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (36)