Table of Contents
Fetching ...

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho, SueYeon Chung

TL;DR

Addressing how prompts reconfigure the internal representations of decoder-only LMs for diverse tasks without updating parameters. The study uses the manifold capacity framework to quantify category-manifold separability in embedding space and to disentangle representation quality from readout alignment, introducing equations like $\alpha = P / D^*$ and using the participation ratio $\text{PR}$ as a proxy for dimension. The results show that instruction, demonstrations, and soft-prompt tuning produce distinct representational geometries and cross-task interactions, with label semantics and input distributions playing crucial roles. The work highlights a readout bottleneck and suggests representation-aware prompting as a path toward more robust and scalable LM adaptation, guiding future geometry-driven prompting methods.

Abstract

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

TL;DR

Addressing how prompts reconfigure the internal representations of decoder-only LMs for diverse tasks without updating parameters. The study uses the manifold capacity framework to quantify category-manifold separability in embedding space and to disentangle representation quality from readout alignment, introducing equations like and using the participation ratio as a proxy for dimension. The results show that instruction, demonstrations, and soft-prompt tuning produce distinct representational geometries and cross-task interactions, with label semantics and input distributions playing crucial roles. The work highlights a readout bottleneck and suggests representation-aware prompting as a path toward more robust and scalable LM adaptation, guiding future geometry-driven prompting methods.

Abstract

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

Paper Structure

This paper contains 53 sections, 4 equations, 26 figures, 3 tables.

Figures (26)

  • Figure 1: Two components of the model’s performance. Low accuracy can be caused by either suboptimal and tangled representation in the embedding space (left), as well as misalignment between the representation and model’s readout layer (right). Manifold capacity, which relates the performance of an ideal decoder to the underlying geometry can differentiate between the 2 cases.
  • Figure 2: Possible effect sites of prompting. Task-specific prefix might affect extraction of relevant features at the sentence-level, reorganizing intermediate representations (top). High performance would also imply more efficient repackaging of extracted features into the embedding of the last token, as well readout alignment (bottom).
  • Figure 3: Performance of demonstrations and instruction prompting on sentiment analysis task.
  • Figure 4: Manifold capacity of sentence-level embeddings during demonstrations prompting compared to instruction and raw sentence control
  • Figure 5: Manifold capacity of last token embeddings during demonstrations prompting compared to instruction and raw sentence control.
  • ...and 21 more figures