Table of Contents
Fetching ...

Fast Analysis of the OpenAI O1-Preview Model in Solving Random K-SAT Problem: Does the LLM Solve the Problem Itself or Call an External SAT Solver?

Raffaele Marino

TL;DR

This manuscript proposes and presents an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.

Abstract

In this manuscript, I present an analysis on the performance of OpenAI O1-preview model in solving random K-SAT instances for K$\in {2,3,4}$ as a function of $α=M/N$ where $M$ is the number of clauses and $N$ is the number of variables of the satisfiable problem. I show that the model can call an external SAT solver to solve the instances, rather than solving them directly. Despite using external solvers, the model reports incorrect assignments as output. Moreover, I propose and present an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.

Fast Analysis of the OpenAI O1-Preview Model in Solving Random K-SAT Problem: Does the LLM Solve the Problem Itself or Call an External SAT Solver?

TL;DR

This manuscript proposes and presents an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.

Abstract

In this manuscript, I present an analysis on the performance of OpenAI O1-preview model in solving random K-SAT instances for K as a function of where is the number of clauses and is the number of variables of the satisfiable problem. I show that the model can call an external SAT solver to solve the instances, rather than solving them directly. Despite using external solvers, the model reports incorrect assignments as output. Moreover, I propose and present an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.
Paper Structure (6 sections, 3 figures)

This paper contains 6 sections, 3 figures.

Figures (3)

  • Figure 1: Fraction of satisfiable assignments, namely P(SAT. ASSIG.), as a function of $\alpha$. Each point is an average over 10 samples and error bars are standard errors. Green points identify the fraction of satisfiable assignments returned and checked by OpenAI O1-preview model, blue points identify the fraction of satisfiable assignments returned by OpenAI O1-preview model and checked by me using the polynomial algorithm in Braunsteincodeverify, while orange points identify the fraction of satisfiable assignments obtained by Pycosat.
  • Figure 2: The figure shows the number of times the OpenAI O1-preview model effectively calls a SAT solver to find an assignment for the random K-SAT instance. The left panel presents the histogram for the random 3-SAT problem, while the right panel shows the histogram for the random 4-SAT problem.
  • Figure 3: The figure shows the number of unsatisfied clauses for a given assignment divided by $M$. Each point is an average over 10 samples and error bars are standard errors. Black line identifies the value of $1/2^K$ for the random 2-SAT (left panel), random 3-SAT (middle panel), random 4-SAT (right panel). If a point is below the black line, the models demonstrates a form of intelligence.