Table of Contents
Fetching ...

A Little Confidence Goes a Long Way

John Scoville, Shang Gao, Devanshu Agrawal, Javed Qadrud-Din

TL;DR

This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores from hidden state activations subject to known constraints via entropy maximization.

Abstract

We introduce a group of related methods for binary classification tasks using probes of the hidden state activations in large language models (LLMs). Performance is on par with the largest and most advanced LLMs currently available, but requiring orders of magnitude fewer computational resources and not requiring labeled data. This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores (prior probabilities) from hidden state activations subject to known constraints via entropy maximization, and selecting the most confident probe model from an ensemble for prediction. These techniques are evaluated on four datasets using five base LLMs.

A Little Confidence Goes a Long Way

TL;DR

This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores from hidden state activations subject to known constraints via entropy maximization.

Abstract

We introduce a group of related methods for binary classification tasks using probes of the hidden state activations in large language models (LLMs). Performance is on par with the largest and most advanced LLMs currently available, but requiring orders of magnitude fewer computational resources and not requiring labeled data. This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores (prior probabilities) from hidden state activations subject to known constraints via entropy maximization, and selecting the most confident probe model from an ensemble for prediction. These techniques are evaluated on four datasets using five base LLMs.
Paper Structure (22 sections, 3 equations, 2 figures, 4 tables)

This paper contains 22 sections, 3 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: An illustration of the Glia framework using a single example query, "Is the sky blue?" This represents one example from a batch that we wish to classify into one of two binary classes. First, given a task prompt (e.g. 'Is (example) blue?'), the dataset labels for the binary equivalence classes ('Yes', 'No') are translated into descriptive representations, either by a human or an instruction-tuned LLM, answering the question 'What is the meaning of a (label) response to this query?' Once descriptive labels are produced, they undergo a forward pass through an LLM. The activations of the LLMs final hidden state layer are extracted. Each of the two descriptive labels are appended to the prompt and passed through the LLM to obtain examples of each answer, again extracting the activations of the final hidden layer. Numerical labels are chosen and paired with translated label activations to create a synthetic dataset for symmetry-breaking cross-entropy pretraining. The two answers of each example become inputs to the symmetry-broken pretrained model, which is trained using the maximum entropy principle (MEP) to create a probe model. Probe pretraining and training are iterated several times to produce an ensemble of probe models. The most confident probe model from the ensemble is selected to make predictions.
  • Figure 2: An illustration of spontaneous symmetry breaking in pretrained models. Glia inference is evaluated after n steps of symmetry-breaking pretraining, where n is shown in logarithmic scale on the x-axis. The y-axis shows average F1 score on the CUAD dataset for each n. A phase transition from a symmetric, disordered state to an asymmetric, ordered state is observed. Note that once symmetry has been broken, further pretraining slowly degrades performance.