Table of Contents
Fetching ...

Graph Property Inference in Small Language Models: Effects of Representation and Inference Strategy

Michal Podstawski

TL;DR

A systematic study of graph-theoretic property inference in small instruction-tuned language models, isolating the roles of input representation and reasoning strategy and identifying practical levers for improving structured inference under constrained model capacity.

Abstract

Recent progress in language modeling has expanded the range of tasks that can be approached through natural language interfaces, including problems that require structured reasoning. However, it remains unclear how effectively limited-capacity language models can infer formal properties of relational structures when those structures are presented in textual form. Understanding the conditions under which structured reasoning succeeds or fails is essential for applying small models in graph-based domains. We conduct a systematic study of graph-theoretic property inference in small instruction-tuned language models, isolating the roles of input representation and reasoning strategy. Across a diverse set of local and global graph metrics, we find that structural performance is highly sensitive to how relational information is organized. Representations that preserve neighborhood structure consistently improve estimation stability and ordinal consistency, while multi-branch reasoning yields the most reliable aggregate gains across configurations. These results show that graph property inference in small language models depends critically on representational organization and inference design. Structural competence is therefore shaped not only by model scale, but by how relational information is encoded and how predictions are elicited. The findings identify practical levers for improving structured inference under constrained model capacity.

Graph Property Inference in Small Language Models: Effects of Representation and Inference Strategy

TL;DR

A systematic study of graph-theoretic property inference in small instruction-tuned language models, isolating the roles of input representation and reasoning strategy and identifying practical levers for improving structured inference under constrained model capacity.

Abstract

Recent progress in language modeling has expanded the range of tasks that can be approached through natural language interfaces, including problems that require structured reasoning. However, it remains unclear how effectively limited-capacity language models can infer formal properties of relational structures when those structures are presented in textual form. Understanding the conditions under which structured reasoning succeeds or fails is essential for applying small models in graph-based domains. We conduct a systematic study of graph-theoretic property inference in small instruction-tuned language models, isolating the roles of input representation and reasoning strategy. Across a diverse set of local and global graph metrics, we find that structural performance is highly sensitive to how relational information is organized. Representations that preserve neighborhood structure consistently improve estimation stability and ordinal consistency, while multi-branch reasoning yields the most reliable aggregate gains across configurations. These results show that graph property inference in small language models depends critically on representational organization and inference design. Structural competence is therefore shaped not only by model scale, but by how relational information is encoded and how predictions are elicited. The findings identify practical levers for improving structured inference under constrained model capacity.
Paper Structure (24 sections, 1 equation, 2 figures, 4 tables)

This paper contains 24 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Macro-averaged Spearman rank correlation ($\rho$) across graph properties. Higher values indicate stronger preservation of the relative ordering of graphs between graphs. Adjacency-list encoding generally improves rank consistency, and GoT aggregation achieves the highest $\rho$ for Qwen2.5-3B-Instruct. Negative values indicate loss of ordinal alignment under certain configurations.
  • Figure 2: Macro-level improvement in standard-deviation-normalized error (NRMSE$_{\text{std}}$) relative to baseline prompting. Bars show $\Delta$NRMSE$_{\text{std}} = \text{NRMSE}_{\text{Baseline}} - \text{NRMSE}_{\text{Strategy}}$; positive values indicate reduced error. Graph-of-Thoughts (GoT) consistently yields the largest error reductions, particularly for Qwen2.5-3B-Instruct under edge-list serialization. Chain-of-Thought (CoT) exhibits smaller and more variable effects.