Table of Contents
Fetching ...

In-Context Clustering with Large Language Models

Ying Wang, Mengye Ren, Andrew Gordon Wilson

TL;DR

ICC reframes clustering as an unsupervised in-context learning task that leverages large language model attention to capture context-dependent relationships without predefined similarity functions. It demonstrates strong zero-shot clustering capabilities on text-encoded numeric data and extends to multimodal data, including image clustering, with spectral methods on attention matrices and text-conditioned objectives. The work shows further improvements via LoRA-based fine-tuning with a Next Token Prediction loss, enabling competitive or superior performance on numeric and image clustering and enabling text-conditioned image clustering. Overall, ICC highlights the flexibility and potential of LLMs to perform complex, semantically rich clustering across modalities, while outlining challenges such as long-context efficiency and the need for theoretical grounding.

Abstract

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex relationships among inputs through an attention mechanism. We show that pretrained LLMs exhibit impressive zero-shot clustering capabilities on text-encoded numeric data, with attention matrices showing salient cluster patterns. Spectral clustering using attention matrices offers surprisingly competitive performance. We further enhance the clustering capabilities of LLMs on numeric and image data through fine-tuning using the Next Token Prediction (NTP) loss. Moreover, the flexibility of LLM prompting enables text-conditioned image clustering, a capability that classical clustering methods lack. Our work extends in-context learning to an unsupervised setting, showcasing the effectiveness and flexibility of LLMs for clustering. Our code is available at https://agenticlearning.ai/icc.

In-Context Clustering with Large Language Models

TL;DR

ICC reframes clustering as an unsupervised in-context learning task that leverages large language model attention to capture context-dependent relationships without predefined similarity functions. It demonstrates strong zero-shot clustering capabilities on text-encoded numeric data and extends to multimodal data, including image clustering, with spectral methods on attention matrices and text-conditioned objectives. The work shows further improvements via LoRA-based fine-tuning with a Next Token Prediction loss, enabling competitive or superior performance on numeric and image clustering and enabling text-conditioned image clustering. Overall, ICC highlights the flexibility and potential of LLMs to perform complex, semantically rich clustering across modalities, while outlining challenges such as long-context efficiency and the need for theoretical grounding.

Abstract

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex relationships among inputs through an attention mechanism. We show that pretrained LLMs exhibit impressive zero-shot clustering capabilities on text-encoded numeric data, with attention matrices showing salient cluster patterns. Spectral clustering using attention matrices offers surprisingly competitive performance. We further enhance the clustering capabilities of LLMs on numeric and image data through fine-tuning using the Next Token Prediction (NTP) loss. Moreover, the flexibility of LLM prompting enables text-conditioned image clustering, a capability that classical clustering methods lack. Our work extends in-context learning to an unsupervised setting, showcasing the effectiveness and flexibility of LLMs for clustering. Our code is available at https://agenticlearning.ai/icc.

Paper Structure

This paper contains 36 sections, 3 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: In-Context Clustering (ICC). LLMs can flexibly handle diverse modalities and perform text-conditioned clustering. We show the zero-shot clustering capability in pretrained LLMs and further strengthen it through finetuning.
  • Figure 2:
  • Figure 3: Visualization of Attention Allocation of Input Data and Generated Cluster Labels at an Intermediate Layer. The x-axis and y-axis are the ground-truth cluster labels. The left figure is for the pretrained Llama-3.1-8b-Instruct, and the right is after fine-tuning(details in \ref{['sec:finetune_num']}). The top right curves are the average accuracy of spectral clustering using the input-input attention score matrices (top-left) across different layers, compared with the average accuracy of LLM generation.
  • Figure 4: Left: Multimodal LLM Architecture with Average Pooling for Image Features. Right: Qualitative Comparison of Models on Image Clustering --- ICC outperforms k-means when the data has rich semantic information.
  • Figure 5: LLMs are able to produce different clusterings according to the condition in the prompt.
  • ...and 10 more figures