Table of Contents
Fetching ...

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui

TL;DR

LimiX introduces a unified large structured-data model (LDM) for tabular data, treating inputs as a joint distribution over variables and missingness. It combines Context-Conditional Masked Modeling (CCMM) with a lightweight Transformer architecture and a discriminative feature encoding to support diverse tasks (classification, regression, imputation, generation) without task-specific architectures. Data for pretraining are synthesized from hierarchical SCMs, and inference can employ retrieval-based ensembles for efficient, calibrated predictions. The paper also presents the first scaling laws for LDMs, showing predictable power-law relationships between model/data scale and downstream performance, guiding design choices for tabular foundation models. Overall, LimiX achieves state-of-the-art performance across 11 benchmarks, with strong robustness and efficient variants under tight compute budgets, suggesting practical impact for generalist tabular intelligence.

Abstract

We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX-16M and LimiX-2M, two instantiations of our large structured-data models (LDMs). Both models treat structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. They are pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, supporting rapid, training-free adaptation at inference. We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. LimiX-16M consistently surpasses strong baselines, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. Notably, LimiX-2M delivers strong results under tight compute and memory budgets. We also present the first scaling law study for LDMs, revealing how data and model scaling jointly influence downstream performance and offering quantitative guidance for tabular foundation modeling. All LimiX models are publicly accessible under Apache 2.0.

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

TL;DR

LimiX introduces a unified large structured-data model (LDM) for tabular data, treating inputs as a joint distribution over variables and missingness. It combines Context-Conditional Masked Modeling (CCMM) with a lightweight Transformer architecture and a discriminative feature encoding to support diverse tasks (classification, regression, imputation, generation) without task-specific architectures. Data for pretraining are synthesized from hierarchical SCMs, and inference can employ retrieval-based ensembles for efficient, calibrated predictions. The paper also presents the first scaling laws for LDMs, showing predictable power-law relationships between model/data scale and downstream performance, guiding design choices for tabular foundation models. Overall, LimiX achieves state-of-the-art performance across 11 benchmarks, with strong robustness and efficient variants under tight compute budgets, suggesting practical impact for generalist tabular intelligence.

Abstract

We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX-16M and LimiX-2M, two instantiations of our large structured-data models (LDMs). Both models treat structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. They are pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, supporting rapid, training-free adaptation at inference. We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. LimiX-16M consistently surpasses strong baselines, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. Notably, LimiX-2M delivers strong results under tight compute and memory budgets. We also present the first scaling law study for LDMs, revealing how data and model scaling jointly influence downstream performance and offering quantitative guidance for tabular foundation modeling. All LimiX models are publicly accessible under Apache 2.0.

Paper Structure

This paper contains 54 sections, 11 theorems, 45 equations, 27 figures, 30 tables.

Key Result

Proposition 6.1

Under mild assumptions, for any $k \in [d]$, there is a one-to-one correspondence between the distribution $p(\mathbf{X}^{\text{te}}|\mathbf{X}^{\text{ct}})$ and the family of conditionals $\{p(\mathbf{X}^{\text{te}}_{\pi}|\mathbf{X}^{\text{te}}_{-\pi}, \mathbf{X}^{\text{ct}}): \forall \pi \in \Pi_k

Figures (27)

  • Figure 1: Performance comparison on the averaged reciprocal of the ranks, where the rank is that of the corresponding model on ROC AUC. Higher values indicate stronger average ranking performance across all the classification benchmarks.
  • Figure 2: Performance comparison on the averaged reciprocal of the ranks, where the rank is that of the corresponding model on R2 across all the regression benchmarks.
  • Figure 3: The overall model structure of LimiX-16M.
  • Figure 4: An example of the generated DAG, where $g_{i,j}$ is an edge function defining the relationship between a parent node $X_i$ and its child node $X_j$ in the local causal structures.
  • Figure 5: Toy example of sample-level attention. In-context samples that share the same category as the query sample are assigned higher scores through the attention module of LimiX-16M.
  • ...and 22 more figures

Theorems & Definitions (19)

  • Proposition 6.1: Informal; See \ref{['prop:identifiability-full']}
  • Theorem 6.2: Informal; see \ref{['thrm:sample-efficiency-full']}
  • Theorem 6.3: Informal; See \ref{['theo:joint_dist-full']}
  • Proposition B.1: Formal Version of \ref{['prop:identifiability']}
  • proof
  • Example B.1
  • Theorem B.2: Formal Version of \ref{['thrm:sample-efficiency']}
  • Lemma B.3: van2000asymptotic, Theorem 5.23; statement adapted from qin2024fitli2024promises
  • Lemma B.4
  • proof
  • ...and 9 more