Table of Contents
Fetching ...

Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge

Sahil Kale

TL;DR

This work reveals that LLMs often confuse memorized solutions with genuine reasoning, inflating their perceived self-knowledge. It introduces a universal task-perturbation framework and two metrics, MIRAGE and SKEW, to quantify memorization-driven overconfidence and self-knowledge wavering across STEM domains. Experimental results show significant inconsistencies in feasibility judgments (over 45% in many cases) and pronounced effects in science and medicine, underscoring trust and safety concerns. The authors provide a public evaluation pipeline and advocate for safeguards to improve AI explainability and reliability in high-stakes domains.

Abstract

When artificial intelligence mistakes memorization for intelligence, it creates a dangerous mirage of reasoning. Existing studies treat memorization and self-knowledge deficits in LLMs as separate issues and do not recognize an intertwining link that degrades the trustworthiness of LLM responses. In our study, we utilize a novel framework to ascertain if LLMs genuinely learn reasoning patterns from training data or merely memorize them to assume competence across problems of similar complexity focused on STEM domains. Our analysis shows a noteworthy problem in generalization: LLMs draw confidence from memorized solutions to infer a higher self-knowledge about their reasoning ability, which manifests as an over 45% inconsistency in feasibility assessments when faced with self-validated, logically coherent task perturbations. This effect is most pronounced in science and medicine domains, which tend to have maximal standardized jargon and problems, further confirming our approach. Significant wavering within the self-knowledge of LLMs also shows flaws in current architectures and training patterns, highlighting the need for techniques that ensure a balanced, consistent stance on models' perceptions of their own knowledge for maximum AI explainability and trustworthiness. Our code and results are available publicly at https://github.com/Sahil-R-Kale/mirage_of_mastery

Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge

TL;DR

This work reveals that LLMs often confuse memorized solutions with genuine reasoning, inflating their perceived self-knowledge. It introduces a universal task-perturbation framework and two metrics, MIRAGE and SKEW, to quantify memorization-driven overconfidence and self-knowledge wavering across STEM domains. Experimental results show significant inconsistencies in feasibility judgments (over 45% in many cases) and pronounced effects in science and medicine, underscoring trust and safety concerns. The authors provide a public evaluation pipeline and advocate for safeguards to improve AI explainability and reliability in high-stakes domains.

Abstract

When artificial intelligence mistakes memorization for intelligence, it creates a dangerous mirage of reasoning. Existing studies treat memorization and self-knowledge deficits in LLMs as separate issues and do not recognize an intertwining link that degrades the trustworthiness of LLM responses. In our study, we utilize a novel framework to ascertain if LLMs genuinely learn reasoning patterns from training data or merely memorize them to assume competence across problems of similar complexity focused on STEM domains. Our analysis shows a noteworthy problem in generalization: LLMs draw confidence from memorized solutions to infer a higher self-knowledge about their reasoning ability, which manifests as an over 45% inconsistency in feasibility assessments when faced with self-validated, logically coherent task perturbations. This effect is most pronounced in science and medicine domains, which tend to have maximal standardized jargon and problems, further confirming our approach. Significant wavering within the self-knowledge of LLMs also shows flaws in current architectures and training patterns, highlighting the need for techniques that ensure a balanced, consistent stance on models' perceptions of their own knowledge for maximum AI explainability and trustworthiness. Our code and results are available publicly at https://github.com/Sahil-R-Kale/mirage_of_mastery

Paper Structure

This paper contains 14 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Broad idea of memorization-driven skew in self-knowledge: inconsistent feasibility assessments with minor task perturbations (underlined). The task on the left is generated by the LLM itself with a confident claim of answerability, while slight perturbations on the right are then deemed infeasible.
  • Figure 2: Methodology to analyse how LLM memorization inflates self-perception using minor task perturbations
  • Figure 3: MIRAGE scores for LLMs across STEM domains, with the overall average baseline represented in red
  • Figure 4: SKEW scores for LLMs across STEM domains, with the overall average baseline represented in red
  • Figure 5: Results showing LLM performance metrics measuring memorization-driven self-knowledge inflation
  • ...and 4 more figures