Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
Kai Tzu-iunn Ong, Taeyoon Kwon, Jinyoung Yeo
TL;DR
Self-Taught introduces a fully zero-shot framework that automatically generates tailored demonstrations for each test instance to guide LLM reasoning. It identifies the target information, constructs high-quality pseudo problems and high-certainty solutions, and uses these tailored demonstrations to solve the target problem, reducing reliance on costly human demonstrations. Across 13 diverse QA tasks and two real-world Alzheimer's disease diagnosis datasets, Self-Taught outperforms strong baselines and demonstrates robustness to different prompting strategies and open-source LLMs, though it shows limitations in highly homogeneous clinical cases where manual CoT remains competitive. The work highlights a practical, cost-efficient path to enhance domain-specific LLM applications, with detailed ablations, human evaluations, and supplementary resources supporting adoption and extension.
Abstract
Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are "tailored" to the target problem and "filtered" for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.
