Table of Contents
Fetching ...

Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, Vinay Kumar Sankarapu

TL;DR

Orion-Bix tackles the difficulty of few-shot tabular learning by introducing a biaxial attention-based row encoder that captures local-group, coarse, and global feature interactions, paired with an episodic meta-learning regime that generates explicit support/query tasks from synthetic tables. The model preserves TabICL’s column-wise SetTransformer embeddings and a label-aware in-context learner while adding a scalable hierarchical classifier and a masked, support-focused attention scheme. Through synthetic episodic data and careful pretraining, Orion-Bix achieves strong domain-specific performance, excels in very low-shot regimes, and demonstrates robustness to support-set quality. The practical pipeline, including preprocessing and an ensemble of transformed views, enables seamless deployment in real-world tabular workflows and shows that biaxial attention with episodic meta-training can yield robust, few-shot-ready tabular learning.

Abstract

Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure, and limited labeled data make scaling and generalization challenging. To this end, we introduce Orion-Bix, a tabular foundation model that combines biaxial attention with meta-learned in-context reasoning for few-shot tabular learning. Its encoder alternates standard, grouped, hierarchical, and relational attention, fusing their outputs through multi-CLS summarization to capture both local and global dependencies efficiently. A label-aware ICL head adapts on the fly and scales to large label spaces via hierarchical decision routing. Meta-trained on synthetically generated, structurally diverse tables with causal priors, Orion-Bix learns transferable inductive biases across heterogeneous data. Delivered as a scikit-learn compatible foundation model, it outperforms gradient-boosting baselines and remains competitive with state-of-the-art tabular foundation models on public benchmarks, showing that biaxial attention with episodic meta-training enables robust, few-shot-ready tabular learning. The model is publicly available at https://github.com/Lexsi-Labs/Orion-BiX .

Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning

TL;DR

Orion-Bix tackles the difficulty of few-shot tabular learning by introducing a biaxial attention-based row encoder that captures local-group, coarse, and global feature interactions, paired with an episodic meta-learning regime that generates explicit support/query tasks from synthetic tables. The model preserves TabICL’s column-wise SetTransformer embeddings and a label-aware in-context learner while adding a scalable hierarchical classifier and a masked, support-focused attention scheme. Through synthetic episodic data and careful pretraining, Orion-Bix achieves strong domain-specific performance, excels in very low-shot regimes, and demonstrates robustness to support-set quality. The practical pipeline, including preprocessing and an ensemble of transformed views, enables seamless deployment in real-world tabular workflows and shows that biaxial attention with episodic meta-training can yield robust, few-shot-ready tabular learning.

Abstract

Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure, and limited labeled data make scaling and generalization challenging. To this end, we introduce Orion-Bix, a tabular foundation model that combines biaxial attention with meta-learned in-context reasoning for few-shot tabular learning. Its encoder alternates standard, grouped, hierarchical, and relational attention, fusing their outputs through multi-CLS summarization to capture both local and global dependencies efficiently. A label-aware ICL head adapts on the fly and scales to large label spaces via hierarchical decision routing. Meta-trained on synthetically generated, structurally diverse tables with causal priors, Orion-Bix learns transferable inductive biases across heterogeneous data. Delivered as a scikit-learn compatible foundation model, it outperforms gradient-boosting baselines and remains competitive with state-of-the-art tabular foundation models on public benchmarks, showing that biaxial attention with episodic meta-training enables robust, few-shot-ready tabular learning. The model is publicly available at https://github.com/Lexsi-Labs/Orion-BiX .

Paper Structure

This paper contains 34 sections, 16 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: An overview of Orion-Bix architecture. A column-wise SetTransformer-based embedder maps the input table $X \in \mathbb{R}^{B \times n \times m}$ to column embeddings $E \in \mathbb{R}^{B \times n \times m' \times d}$, where $m' = m + N_{\mathrm{CLS}}$ includes reserved CLS slots. The biaxial row encoder reshapes $E$ into per-row sequences $X' \in \mathbb{R}^{(B \cdot n) \times m' \times d}$ and applies a stack of BiAxialAttentionBlocks combining full cross-feature attention ($X' \rightarrow X_1$), local grouped attention ($X_1 \rightarrow X_2$), hierarchical attention across coarse feature partitions ($X_2 \rightarrow X'_2$), and structured relational attention ($X'_2 \rightarrow X_3$). A multi-CLS attention layer $\mathrm{CLSAttn}(\mathrm{CLS}, X_3)$ aggregates each row into a multi-aspect representation $R \in \mathbb{R}^{B \times n \times (N_{\mathrm{CLS}} \cdot d)}$. The label-aware ICL adds projected support labels to support row embeddings and uses a masked cross-attention Transformer to predict labels for query rows.
  • Figure 2: Accuracy of Orion-Bix and TabICL across different support sizes in few-shot experiments.
  • Figure 3: Accuracy of Orion-Bix and TabICL on OpenML and TALENT benchmarks. Bars indicate support selection strategies, averaged over datasets for each model.