Table of Contents
Fetching ...

Conceptual Metaphor Theory as a Prompting Paradigm for Large Language Models

Oliver Kramer

TL;DR

This work proposes Conceptual Metaphor Theory (CMT) as a prompting paradigm to enhance large language models' reasoning by enforcing source–target domain mappings in a chain-of-thought style. It introduces a CMT-prompting framework with predefined source/target mappings, a CoT-like inference process, and preconfigured prompts for ollama-based LLMs, along with a 100-task benchmark spanning MIM, DSR, ETT, and RCM. Using four native LLMs (Llama3.2, Phi3, Gemma2, Mistral) and an expert evaluator (Llama3.3 70B), the study reports significant gains in accuracy, coherence, and metaphorical depth for CMT prompts versus baselines. The findings support the potential of metaphor-guided prompting to improve structured reasoning and teaching-like explanations in AI, with practical implications for explainability and domain-specific reasoning.

Abstract

We introduce Conceptual Metaphor Theory (CMT) as a framework for enhancing large language models (LLMs) through cognitive prompting in complex reasoning tasks. CMT leverages metaphorical mappings to structure abstract reasoning, improving models' ability to process and explain intricate concepts. By incorporating CMT-based prompts, we guide LLMs toward more structured and human-like reasoning patterns. To evaluate this approach, we compare four native models (Llama3.2, Phi3, Gemma2, and Mistral) against their CMT-augmented counterparts on benchmark tasks spanning domain-specific reasoning, creative insight, and metaphor interpretation. Responses were automatically evaluated using the Llama3.3 70B model. Experimental results indicate that CMT prompting significantly enhances reasoning accuracy, clarity, and metaphorical coherence, outperforming baseline models across all evaluated tasks.

Conceptual Metaphor Theory as a Prompting Paradigm for Large Language Models

TL;DR

This work proposes Conceptual Metaphor Theory (CMT) as a prompting paradigm to enhance large language models' reasoning by enforcing source–target domain mappings in a chain-of-thought style. It introduces a CMT-prompting framework with predefined source/target mappings, a CoT-like inference process, and preconfigured prompts for ollama-based LLMs, along with a 100-task benchmark spanning MIM, DSR, ETT, and RCM. Using four native LLMs (Llama3.2, Phi3, Gemma2, Mistral) and an expert evaluator (Llama3.3 70B), the study reports significant gains in accuracy, coherence, and metaphorical depth for CMT prompts versus baselines. The findings support the potential of metaphor-guided prompting to improve structured reasoning and teaching-like explanations in AI, with practical implications for explainability and domain-specific reasoning.

Abstract

We introduce Conceptual Metaphor Theory (CMT) as a framework for enhancing large language models (LLMs) through cognitive prompting in complex reasoning tasks. CMT leverages metaphorical mappings to structure abstract reasoning, improving models' ability to process and explain intricate concepts. By incorporating CMT-based prompts, we guide LLMs toward more structured and human-like reasoning patterns. To evaluate this approach, we compare four native models (Llama3.2, Phi3, Gemma2, and Mistral) against their CMT-augmented counterparts on benchmark tasks spanning domain-specific reasoning, creative insight, and metaphor interpretation. Responses were automatically evaluated using the Llama3.3 70B model. Experimental results indicate that CMT prompting significantly enhances reasoning accuracy, clarity, and metaphorical coherence, outperforming baseline models across all evaluated tasks.

Paper Structure

This paper contains 20 sections, 4 figures.

Figures (4)

  • Figure 1: Instructions for configuration of CMT-prompted LLMs
  • Figure 2: CMT-inspred CoT
  • Figure 3: Prompt for evaluation with Llama3.3
  • Figure 4: Comparison of baseline and CMT-enhanced LLM performance across task categories. The four categories—MIM, DSR, ETT, and RCM—are shown from left to right, with average scores per task class.