On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

Shoaib Ahmed Siddiqui; Yanzhi Chen; Juyeon Heo; Menglin Xia; Adrian Weller

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

Shoaib Ahmed Siddiqui, Yanzhi Chen, Juyeon Heo, Menglin Xia, Adrian Weller

TL;DR

This work proposes a new evaluation framework to comprehensively assess Large Language Models' function modeling abilities and discovers that LLMs are relatively weak in understanding patterns in raw data, but excel at utilizing prior knowledge about the domain to develop a strong understanding of the underlying function.

Abstract

Recent works have successfully applied Large Language Models (LLMs) to function modeling tasks. However, the reasons behind this success remain unclear. In this work, we propose a new evaluation framework to comprehensively assess LLMs' function modeling abilities. By adopting a Bayesian perspective of function modeling, we discover that LLMs are relatively weak in understanding patterns in raw data, but excel at utilizing prior knowledge about the domain to develop a strong understanding of the underlying function. Our findings offer new insights about the strengths and limitations of LLMs in the context of function modeling.

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

TL;DR

Abstract

Paper Structure (25 sections, 10 equations, 4 figures, 4 tables)

This paper contains 25 sections, 10 equations, 4 figures, 4 tables.

Introduction
Background and Related Work
Large language models
LLMs as functional predictors
A Bayesian Evaluation Framework
Function modeling as Bayesian inference.
Evaluation objectives.
Evaluating the ability to understand raw data patterns
Decontextualizing task description.
Evaluating the ability to incorporate domain knowledge
Verbalizing data.
Amplifying the impact of prior.
Experiments
Synthetic data
Setup.
...and 10 more sections

Figures (4)

Figure 1: A motivating example. When making predictions, a model focusing only on raw data may interpret the underlying function as a linear one. However, when domain information is specified (i.e., the trajectory of a cannonball), the model can take into account physical laws for more accurate modeling of the trajectory. Given the vast amount of knowledge gathered during pretraining, LLMs can integrate domain knowledge they possess to generate more accurate predictions. We are interested in separately evaluating LLMs' ability of understanding raw data patterns and the ability of utilizing domain knowledge in function modelling tasks.
Figure 2: Example prompt configurations for evaluating the quality of the likelihood $p(\mathcal{D}|f)$ and the posterior $p(f|\mathcal{D})$ encoded by the LLM in a function modeling task. When evaluating the posterior, a prompt (highlighted in color gray) is used to explicitly encourage the LLM to make use of domain knowledge regarding the task.
Figure 3: Basic evaluations of function modeling using 25 training points, where we compare LLM performance (in particular GPT-4) with a 4-layer MLP with 64 hidden units. The MSE indicates direct prediction performance.
Figure 4: CO$_2$ level modeling: (a) Predictions made by GPT-4 with and without domain knowledge. (b-c) Predictions made by Gaussian processes with various kernels. The expert kernel is taken from williams2006gpbook.

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

TL;DR

Abstract

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (4)