Table of Contents
Fetching ...

Can a Large Language Model Learn Matrix Functions In Context?

Paimon Goulart, Evangelos E. Papalexakis

TL;DR

The capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition, is explored, finding that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models.

Abstract

Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL), where models learn from a few input-output pairs without explicit fine-tuning. In this paper, we explore the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition. Our experiments show that while LLMs perform comparably to traditional models such as Stochastic Gradient Descent (SGD) based Linear Regression and Neural Networks (NN) for simpler tasks, they outperform these models on more complex tasks, particularly in the case of top-k Singular Values. Furthermore, LLMs demonstrate strong scalability, maintaining high accuracy even as the matrix size increases. Additionally, we found that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models. These results suggest that LLMs could provide an efficient alternative to classical methods for solving high-dimensional problems. Future work will focus on extending these findings to larger matrices and more complex matrix operations while exploring the effect of using different numerical representations in ICL.

Can a Large Language Model Learn Matrix Functions In Context?

TL;DR

The capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition, is explored, finding that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models.

Abstract

Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL), where models learn from a few input-output pairs without explicit fine-tuning. In this paper, we explore the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition. Our experiments show that while LLMs perform comparably to traditional models such as Stochastic Gradient Descent (SGD) based Linear Regression and Neural Networks (NN) for simpler tasks, they outperform these models on more complex tasks, particularly in the case of top-k Singular Values. Furthermore, LLMs demonstrate strong scalability, maintaining high accuracy even as the matrix size increases. Additionally, we found that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models. These results suggest that LLMs could provide an efficient alternative to classical methods for solving high-dimensional problems. Future work will focus on extending these findings to larger matrices and more complex matrix operations while exploring the effect of using different numerical representations in ICL.

Paper Structure

This paper contains 17 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: Example prompt with 5x5 matrix, input can also be set as a vector or a matrix of any size. Example outputs denoted as $\lambda$ represent the output as either a singular scalar, or vector (adapted from Coda-Forno et al. coda2023meta.)
  • Figure 2: Average predictions compared to average actual values for the vector norm learning task.
  • Figure 3: RMSE of the vector norm predictions for every model in the vector norm learning task.
  • Figure 4: Average predictions compared to average actual values for the nuclear norm learning task at randomly selected experiments for 5x5 matrix inputs.
  • Figure 5: RMSE for the nuclear norm and SVD learning tasks when given 5x5 matrices as inputs.
  • ...and 3 more figures