Table of Contents
Fetching ...

Towards Generalist Prompting for Large Language Models by Mental Models

Haoxiang Guan, Jiyan He, Shuxin Zheng, En-Hong Chen, Weiming Zhang, Nenghai Yu

TL;DR

This work tackles the limitation of task-specific prompting by introducing generalist prompting through MeMo, a framework that enables LLMs to autonomously select and apply multiple mental models for problem solving in zero-shot settings. By defining mental models and providing examples, MeMo unifies diverse prompting strategies and demonstrates near-state-of-the-art performance across logical reasoning, STEM, and commonsense tasks without task-specific prompts. Empirical results show MeMo's strong cross-domain generalization across GPT-3.5, GPT-4, and Llama-2, with notable improvements over baseline prompting methods and meaningful ablations that emphasize the importance of definitions and examples. The approach reduces human prompting overhead and offers a scalable path toward more flexible, generalist AI systems with broad practical impact in reasoning and knowledge-intensive tasks.

Abstract

Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance on a wide range of tasks while eliminating the need for manual selection and customization of prompts tailored to specific problems. Furthermore, we propose MeMo (Mental Models), an innovative prompting method that is simple-designed yet effectively fulfills the criteria of generalist prompting. MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings. We hope that the insights presented herein will stimulate further exploration of generalist prompting methods for LLMs.

Towards Generalist Prompting for Large Language Models by Mental Models

TL;DR

This work tackles the limitation of task-specific prompting by introducing generalist prompting through MeMo, a framework that enables LLMs to autonomously select and apply multiple mental models for problem solving in zero-shot settings. By defining mental models and providing examples, MeMo unifies diverse prompting strategies and demonstrates near-state-of-the-art performance across logical reasoning, STEM, and commonsense tasks without task-specific prompts. Empirical results show MeMo's strong cross-domain generalization across GPT-3.5, GPT-4, and Llama-2, with notable improvements over baseline prompting methods and meaningful ablations that emphasize the importance of definitions and examples. The approach reduces human prompting overhead and offers a scalable path toward more flexible, generalist AI systems with broad practical impact in reasoning and knowledge-intensive tasks.

Abstract

Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance on a wide range of tasks while eliminating the need for manual selection and customization of prompts tailored to specific problems. Furthermore, we propose MeMo (Mental Models), an innovative prompting method that is simple-designed yet effectively fulfills the criteria of generalist prompting. MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings. We hope that the insights presented herein will stimulate further exploration of generalist prompting methods for LLMs.
Paper Structure (23 sections, 7 figures, 11 tables)

This paper contains 23 sections, 7 figures, 11 tables.

Figures (7)

  • Figure 1: The development path of artificial intelligence (AI) models towards generalist capabilities.
  • Figure 2: Serving as a generalist prompting method, MeMo can achieve or be near to the state-of-the-art performance on diverse tasks with GPT-3.5 in zero-shot settings while eliminating manual selection and customization of a well-suited prompt for a specific problem, showing superior generalization capabilities.
  • Figure 3: Illustration of MeMo compared to Chain-of-Thought (CoT) wei2023chainofthought and Step-Back (SB) zheng2023step prompting. Left: an example from StrategyQA geva2021strategyqa where financial analysis is recognized by the LLM as a suitable mental model. MeMo and SB prompting successfully answer the question while CoT fails. Right: another example from StrategyQA where knowledge of explosives is recognized by the LLM as an applicable mental model for the question. MeMo answers the question successfully while both CoT and SB prompting fail.
  • Figure 4: Analysis of MeMo on Logical Reasoning task. Left: Ablation study on StrategyQA geva2021strategyqa and FOLIO han2022folio using GPT-3.5. Right: Statistical analysis of the selection of mental models on FOLIO. Logical reasoning and deductive reasoning are the two most commonly proposed mental models by LLMs.
  • Figure 5: Ablation study of MeMo on MMLU College Computer Science, College Math, and Electrical Engineering with GPT-3.5.
  • ...and 2 more figures