Table of Contents
Fetching ...

Towards Universal Neural Likelihood Inference

Shreyas Bhat Brahmavar, Yang Li, Qiyang Liu, Shashank Srivastava, Junier Oliva

TL;DR

The paper tackles universal likelihood inference across heterogeneous tabular data by introducing ASPIRE, a single model capable of outputting data-grounded conditional likelihoods for arbitrary targets. ASPIRE uses a universal likelihood framework with permutation-invariant, set-based reasoning, combining feature-value atoms, intra- and inter-instance processing, and semantic grounding through dataset descriptions. It achieves state-of-the-art zero-, few-, and many-shot performance across 1400 real-world datasets, while enabling open-world active feature acquisition that selects informative features at inference time. The work demonstrates the practical impact of combining semantic grounding, permutation-aware inference, and probabilistic conditioning for cross-domain open-world inference and adaptive data acquisition.

Abstract

We introduce universal neural likelihood inference (UNLI): enabling a single model to provide data-grounded, conditional likelihood predictions for arbitrary targets given any collection of observed features, across diverse domains and tasks. To achieve UNLI over heterogeneous tabular data, we develop the Arbitrary Set-based Permutation-Invariant Reasoning Engine (ASPIRE) model. Our design addresses critical gaps in existing approaches to merge semantic-understanding capabilities and generalised numerical feature reasoning within a zero-shot capable framework. Trained on over 1,400 real diverse datasets spanning various domains, ASPIRE achieves 15\% higher F1 scores and 85\% lower RMSE than existing tabular foundation models in zero-shot and few-shot settings. Lastly, this work introduces open-world active feature acquisition, where we leverage the UNLI capabilities of ASPIRE to adeptly determine next feature-values to observe to improve inference time prediction accuracies.

Towards Universal Neural Likelihood Inference

TL;DR

The paper tackles universal likelihood inference across heterogeneous tabular data by introducing ASPIRE, a single model capable of outputting data-grounded conditional likelihoods for arbitrary targets. ASPIRE uses a universal likelihood framework with permutation-invariant, set-based reasoning, combining feature-value atoms, intra- and inter-instance processing, and semantic grounding through dataset descriptions. It achieves state-of-the-art zero-, few-, and many-shot performance across 1400 real-world datasets, while enabling open-world active feature acquisition that selects informative features at inference time. The work demonstrates the practical impact of combining semantic grounding, permutation-aware inference, and probabilistic conditioning for cross-domain open-world inference and adaptive data acquisition.

Abstract

We introduce universal neural likelihood inference (UNLI): enabling a single model to provide data-grounded, conditional likelihood predictions for arbitrary targets given any collection of observed features, across diverse domains and tasks. To achieve UNLI over heterogeneous tabular data, we develop the Arbitrary Set-based Permutation-Invariant Reasoning Engine (ASPIRE) model. Our design addresses critical gaps in existing approaches to merge semantic-understanding capabilities and generalised numerical feature reasoning within a zero-shot capable framework. Trained on over 1,400 real diverse datasets spanning various domains, ASPIRE achieves 15\% higher F1 scores and 85\% lower RMSE than existing tabular foundation models in zero-shot and few-shot settings. Lastly, this work introduces open-world active feature acquisition, where we leverage the UNLI capabilities of ASPIRE to adeptly determine next feature-values to observe to improve inference time prediction accuracies.

Paper Structure

This paper contains 27 sections, 10 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Illustration of UNLI capabilities, where a single ASPIRE model is able to output a likelihood in the domain of the specified target over a wide universe of conditioning information.
  • Figure 2: ASPIRE architecture. The model processes query instances, optional support sets, and dataset context through set-based mappings, maintaining permutation invariance at both feature and instance levels for UNLI.
  • Figure 3: Five-shot F1 scores ($\uparrow$) averaged over 15 heldout classification tasks.
  • Figure 4: Few-shot classification performance (Average F1) vs. number of shots.
  • Figure 5: Few-shot average RMSE ($\downarrow$) across held out regression datasets.
  • ...and 4 more figures