Table of Contents
Fetching ...

Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences

Hadi Hosseini, Samarth Khanna, Ronak Singh

TL;DR

This work evaluates large language models on algorithmic reasoning in two-sided matching markets with ranked preferences, focusing on stability, blocking pairs, and the Deferred Acceptance algorithm. By benchmarking seven models across Easy/Medium/Hard instances drawn from IC and ML distributions, the study reveals that advanced reasoning helps only on small-scale problems, while large markets elicit substantial failures and instability. Fine-tuning with synthetic reasoning traces (LoRA) yields strong gains for easy/medium instances and even achieves near-perfect stability for some models, but fails to close the gap on hard instances. The findings highlight fundamental limits in current LLMs for scalable, structured algorithmic reasoning over large contextual inputs and point to the need for more sophisticated strategies beyond prompting and lightweight fine-tuning. This work has practical implications for deploying LLMs in market-design tasks and informs future directions in instruction-tuning and compositional reasoning for complex combinatorial problems.

Abstract

The rise of Large Language Models (LLMs) has driven progress in reasoning tasks -- from program synthesis to scientific hypothesis generation -- yet their ability to handle ranked preferences and structured algorithms in combinatorial domains remains underexplored. We study matching markets, a core framework behind applications like resource allocation and ride-sharing, which require reconciling individual ranked preferences to ensure stable outcomes. We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks -- ranging from stable-matching generation to instability detection, instability resolution, and fine-grained preference queries -- to systematically expose their logical and algorithmic limitations in handling ranked inputs. Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets, often failing to identify blocking pairs or execute algorithms iteratively. We further show that parameter-efficient fine-tuning (LoRA) significantly improves performance in small markets, but fails to bring about a similar improvement on large instances, suggesting the need for more sophisticated strategies to improve LLMs' reasoning with larger-context inputs.

Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences

TL;DR

This work evaluates large language models on algorithmic reasoning in two-sided matching markets with ranked preferences, focusing on stability, blocking pairs, and the Deferred Acceptance algorithm. By benchmarking seven models across Easy/Medium/Hard instances drawn from IC and ML distributions, the study reveals that advanced reasoning helps only on small-scale problems, while large markets elicit substantial failures and instability. Fine-tuning with synthetic reasoning traces (LoRA) yields strong gains for easy/medium instances and even achieves near-perfect stability for some models, but fails to close the gap on hard instances. The findings highlight fundamental limits in current LLMs for scalable, structured algorithmic reasoning over large contextual inputs and point to the need for more sophisticated strategies beyond prompting and lightweight fine-tuning. This work has practical implications for deploying LLMs in market-design tasks and informs future directions in instruction-tuning and compositional reasoning for complex combinatorial problems.

Abstract

The rise of Large Language Models (LLMs) has driven progress in reasoning tasks -- from program synthesis to scientific hypothesis generation -- yet their ability to handle ranked preferences and structured algorithms in combinatorial domains remains underexplored. We study matching markets, a core framework behind applications like resource allocation and ride-sharing, which require reconciling individual ranked preferences to ensure stable outcomes. We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks -- ranging from stable-matching generation to instability detection, instability resolution, and fine-grained preference queries -- to systematically expose their logical and algorithmic limitations in handling ranked inputs. Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets, often failing to identify blocking pairs or execute algorithms iteratively. We further show that parameter-efficient fine-tuning (LoRA) significantly improves performance in small markets, but fails to bring about a similar improvement on large instances, suggesting the need for more sophisticated strategies to improve LLMs' reasoning with larger-context inputs.

Paper Structure

This paper contains 74 sections, 8 figures, 8 tables, 5 algorithms.

Figures (8)

  • Figure 1: The framework for reasoning with ranked preferences through matching markets.
  • Figure 2: The generated responses by LLMs with Master-list (ML) and Impartial Culture (IC) preferences at different difficulty levels. Stable indicates one-to-one matchings with no blocking pairs; otherwise it is unstable. Invalid do not adhere to one-to-one constraint, partial are one-to-one but leave some unmatched, and Fail indicates models' failure to return any matching.
  • Figure 3: Instability Rate (lower is better) within unstable outcomes returned by each model as compared to randomly selected valid (but not necessarily stable) solutions (Random).
  • Figure 4: Optimality Rate within unstable outcomes returned by each model as compared to randomly selected valid (but not necessarily stable) solutions (Random).
  • Figure 5: The fraction of responses where each model correctly detects stability or instability of a given matching.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Example 1: An instance with multiple stable solutions.