Table of Contents
Fetching ...

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo

TL;DR

A method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question, enhances the understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.

Abstract

Despite the advances in large language models (LLMs), how they use their knowledge for reasoning is not yet well understood. In this study, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions into three depths: (i) recalling conceptual knowledge, (ii) applying procedural knowledge, and (iii) analyzing strategic knowledge. Based on a hierarchical graph, we quantify forward discrepancy, a discrepancy in LLM performance on simpler sub-problems versus complex questions. We also measure backward discrepancy where LLMs answer complex questions but struggle with simpler ones. Our analysis shows that smaller models exhibit more discrepancies than larger models. Distinct patterns of discrepancies are observed across model capacity and possibility of training data memorization. Additionally, guiding models from simpler to complex questions through multi-turn interactions improves performance across model sizes, highlighting the importance of structured intermediate steps in knowledge reasoning. This work enhances our understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

TL;DR

A method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question, enhances the understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.

Abstract

Despite the advances in large language models (LLMs), how they use their knowledge for reasoning is not yet well understood. In this study, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions into three depths: (i) recalling conceptual knowledge, (ii) applying procedural knowledge, and (iii) analyzing strategic knowledge. Based on a hierarchical graph, we quantify forward discrepancy, a discrepancy in LLM performance on simpler sub-problems versus complex questions. We also measure backward discrepancy where LLMs answer complex questions but struggle with simpler ones. Our analysis shows that smaller models exhibit more discrepancies than larger models. Distinct patterns of discrepancies are observed across model capacity and possibility of training data memorization. Additionally, guiding models from simpler to complex questions through multi-turn interactions improves performance across model sizes, highlighting the importance of structured intermediate steps in knowledge reasoning. This work enhances our understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.
Paper Structure (54 sections, 2 equations, 8 figures, 30 tables)

This paper contains 54 sections, 2 equations, 8 figures, 30 tables.

Figures (8)

  • Figure 1: Example of reasoning across depths, showing a sequence of questions from $D_1$ (conceptual knowledge) to $D_3$ (strategic knowledge). Questions that ask deeper levels of knowledge require reasoning from multiple areas of shallower knowledge, which are represented as sub-questions.
  • Figure 2: Hierarchical structure of a deconstructed $D_3$, illustrating forward and backward discrepancies. Transition to deeper nodes requires acquiring and reasoning with knowledge from the connected shallower nodes.
  • Figure 3: Memorization analysis with Min-K% probability. (a)-(d) show the distribution of average Min-K% probabilities at each depth. (e)-(g) present the distribution of score differences between neighboring questions, whose Min-K% probability is in the bottom 25% or top 75%. A positive gap indicates backward discrepancy, while a negative gap represents forward discrepancy.
  • Figure 4: Performance change after providing shallower questions. Note that $D_1$ is not reported for prompt inputs, as $D_1$ does not have shallower questions.
  • Figure 5: Average Min-K% probability at each depth. Lower values indicate more memorization while higher values indicate less memorization.
  • ...and 3 more figures