Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Hadi Hosseini, Samarth Khanna, Ronak Singh
TL;DR
This work evaluates large language models on algorithmic reasoning in two-sided matching markets with ranked preferences, focusing on stability, blocking pairs, and the Deferred Acceptance algorithm. By benchmarking seven models across Easy/Medium/Hard instances drawn from IC and ML distributions, the study reveals that advanced reasoning helps only on small-scale problems, while large markets elicit substantial failures and instability. Fine-tuning with synthetic reasoning traces (LoRA) yields strong gains for easy/medium instances and even achieves near-perfect stability for some models, but fails to close the gap on hard instances. The findings highlight fundamental limits in current LLMs for scalable, structured algorithmic reasoning over large contextual inputs and point to the need for more sophisticated strategies beyond prompting and lightweight fine-tuning. This work has practical implications for deploying LLMs in market-design tasks and informs future directions in instruction-tuning and compositional reasoning for complex combinatorial problems.
Abstract
The rise of Large Language Models (LLMs) has driven progress in reasoning tasks -- from program synthesis to scientific hypothesis generation -- yet their ability to handle ranked preferences and structured algorithms in combinatorial domains remains underexplored. We study matching markets, a core framework behind applications like resource allocation and ride-sharing, which require reconciling individual ranked preferences to ensure stable outcomes. We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks -- ranging from stable-matching generation to instability detection, instability resolution, and fine-grained preference queries -- to systematically expose their logical and algorithmic limitations in handling ranked inputs. Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets, often failing to identify blocking pairs or execute algorithms iteratively. We further show that parameter-efficient fine-tuning (LoRA) significantly improves performance in small markets, but fails to bring about a similar improvement on large instances, suggesting the need for more sophisticated strategies to improve LLMs' reasoning with larger-context inputs.
