Table of Contents
Fetching ...

Large Language Models are In-context Teachers for Knowledge Reasoning

Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu

TL;DR

Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines, and Teach-Back that aligns a teacher LLM with the student to enhance the ICT performance is proposed, which supports the encoding specificity hypothesis.

Abstract

In this work, we study in-context teaching (ICT), where a teacher provides in-context example rationales to teach a student to reason over unseen cases. Human teachers are usually required to craft in-context demonstrations, which are costly and have high variance. We ask whether a large language model (LLM) can serve as a more effective in-context teacher for itself or other LLMs, compared to humans. Inspired by the Encoding Specificity Hypothesis from human episodic memory, we hypothesize that in-context exemplars crafted by the teacher should match the training data of the student. This hypothesis motivates us to propose Self-Explain where an LLM's self-elicited explanations are used as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines. Furthermore, we reveal that for ICT, rationales from different teacher LLMs or human experts that more resemble the student LLM's self-explanations are better in-context demonstrations. This supports our encoding specificity hypothesis. We then propose Teach-Back that aligns a teacher LLM with the student to enhance the ICT performance. For example, Teach-Back enables a 7B model to teach the much larger GPT-3.5 in context, surpassing human teachers by around 5% in test accuracy on medical question answering.

Large Language Models are In-context Teachers for Knowledge Reasoning

TL;DR

Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines, and Teach-Back that aligns a teacher LLM with the student to enhance the ICT performance is proposed, which supports the encoding specificity hypothesis.

Abstract

In this work, we study in-context teaching (ICT), where a teacher provides in-context example rationales to teach a student to reason over unseen cases. Human teachers are usually required to craft in-context demonstrations, which are costly and have high variance. We ask whether a large language model (LLM) can serve as a more effective in-context teacher for itself or other LLMs, compared to humans. Inspired by the Encoding Specificity Hypothesis from human episodic memory, we hypothesize that in-context exemplars crafted by the teacher should match the training data of the student. This hypothesis motivates us to propose Self-Explain where an LLM's self-elicited explanations are used as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines. Furthermore, we reveal that for ICT, rationales from different teacher LLMs or human experts that more resemble the student LLM's self-explanations are better in-context demonstrations. This supports our encoding specificity hypothesis. We then propose Teach-Back that aligns a teacher LLM with the student to enhance the ICT performance. For example, Teach-Back enables a 7B model to teach the much larger GPT-3.5 in context, surpassing human teachers by around 5% in test accuracy on medical question answering.
Paper Structure (43 sections, 5 equations, 8 figures, 6 tables)

This paper contains 43 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of our approaches. Existing few-shot CoT prompting methods rely on human experts to craft rationales as in-context demonstrations. We propose Encoding Specificity Hypothesis to make large language models better in-context teachers than humans. We accordingly design Self-Explain for an LLM itself to be the teacher and Teach-Back to improve the LLM's capability of teaching another model in context.
  • Figure 2: The overall framework and prompting format of our approach. The teacher LLM is prompted to generate explanations on sampled training data. Those teacher's explanations are used as in-context demonstrations for the student model at test time. The student model and the teacher model can be the same.
  • Figure 3: The test performance with respect to the number of self-explanations to generate for each exemplar.
  • Figure 4: (a) ROUGE scores between self-explanations of teacher and student. For "Human" teachers, human-crafted CoTs are used for computation. (b) Strong linear correlation is observed between ROUGE scores of self-explanations of teacher and student and the student's test accuracy.
  • Figure 5: Students' accuracy improvement after applying Teach-Back. Values in brackets stand for students' respective test accuracy w/ Teach-Back. The x-axis represents students who do reasoning. The y-axis is teacher models that provide in-context demonstrations for students.
  • ...and 3 more figures