Towards Generalist Prompting for Large Language Models by Mental Models
Haoxiang Guan, Jiyan He, Shuxin Zheng, En-Hong Chen, Weiming Zhang, Nenghai Yu
TL;DR
This work tackles the limitation of task-specific prompting by introducing generalist prompting through MeMo, a framework that enables LLMs to autonomously select and apply multiple mental models for problem solving in zero-shot settings. By defining mental models and providing examples, MeMo unifies diverse prompting strategies and demonstrates near-state-of-the-art performance across logical reasoning, STEM, and commonsense tasks without task-specific prompts. Empirical results show MeMo's strong cross-domain generalization across GPT-3.5, GPT-4, and Llama-2, with notable improvements over baseline prompting methods and meaningful ablations that emphasize the importance of definitions and examples. The approach reduces human prompting overhead and offers a scalable path toward more flexible, generalist AI systems with broad practical impact in reasoning and knowledge-intensive tasks.
Abstract
Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance on a wide range of tasks while eliminating the need for manual selection and customization of prompts tailored to specific problems. Furthermore, we propose MeMo (Mental Models), an innovative prompting method that is simple-designed yet effectively fulfills the criteria of generalist prompting. MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings. We hope that the insights presented herein will stimulate further exploration of generalist prompting methods for LLMs.
