Table of Contents
Fetching ...

Grimoire is All You Need for Enhancing Large Language Models

Ding Chen, Shichao Song, Qingchen Yu, Zhiyu Li, Wenjin Wang, Feiyu Xiong, Bo Tang

TL;DR

This work addresses the variability of in-context learning across language models by proposing SleIcl, a framework where a strong LLM learns from representative demonstrations to generate a grimoire that guides weaker LLMs. It introduces a formal problem setup, multiple representative-sample selection methods (KCS, HCS, HSS, RSS), two grimoire generation templates (Profound Grimoire and Simple Grimoire), and grimoire ranking via similarity or a dual-tower classifier. Empirically, SleIcl yields consistent gains for weak models across eight datasets and six LLMs, with some instances where small models even surpass GPT-4 zero-shot performance, though the best single grimoire sometimes outperforms the ranking-based approach. The findings suggest that grimoire-based guidance is a promising direction for widening the practical reach of ICL, especially for smaller models, and motivate further refinement of sample selection and ranking strategies for broader applicability.

Abstract

In-context Learning (ICL) is one of the key methods for enhancing the performance of large language models on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and the more extensive the learning data, the stronger its ICL capability. In this paper, we propose a method SLEICL that involves learning from examples using strong language models and then summarizing and transferring these learned skills to weak language models for inference and application. This ensures the stability and effectiveness of ICL. Compared to directly enabling weak language models to learn from prompt examples, SLEICL reduces the difficulty of ICL for these models. Our experiments, conducted on up to eight datasets with five language models, demonstrate that weak language models achieve consistent improvement over their own zero-shot or few-shot capabilities using the SLEICL method. Some weak language models even surpass the performance of GPT4-1106-preview (zero-shot) with the aid of SLEICL.

Grimoire is All You Need for Enhancing Large Language Models

TL;DR

This work addresses the variability of in-context learning across language models by proposing SleIcl, a framework where a strong LLM learns from representative demonstrations to generate a grimoire that guides weaker LLMs. It introduces a formal problem setup, multiple representative-sample selection methods (KCS, HCS, HSS, RSS), two grimoire generation templates (Profound Grimoire and Simple Grimoire), and grimoire ranking via similarity or a dual-tower classifier. Empirically, SleIcl yields consistent gains for weak models across eight datasets and six LLMs, with some instances where small models even surpass GPT-4 zero-shot performance, though the best single grimoire sometimes outperforms the ranking-based approach. The findings suggest that grimoire-based guidance is a promising direction for widening the practical reach of ICL, especially for smaller models, and motivate further refinement of sample selection and ranking strategies for broader applicability.

Abstract

In-context Learning (ICL) is one of the key methods for enhancing the performance of large language models on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and the more extensive the learning data, the stronger its ICL capability. In this paper, we propose a method SLEICL that involves learning from examples using strong language models and then summarizing and transferring these learned skills to weak language models for inference and application. This ensures the stability and effectiveness of ICL. Compared to directly enabling weak language models to learn from prompt examples, SLEICL reduces the difficulty of ICL for these models. Our experiments, conducted on up to eight datasets with five language models, demonstrate that weak language models achieve consistent improvement over their own zero-shot or few-shot capabilities using the SLEICL method. Some weak language models even surpass the performance of GPT4-1106-preview (zero-shot) with the aid of SLEICL.
Paper Structure (26 sections, 5 equations, 5 figures, 14 tables)

This paper contains 26 sections, 5 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Compared to having a language model directly engage in Regular In-Context Learning (Regular ICL), Strong LLM Enhanced In-Context Learning (SleIcl) involves having a strong language model initially learn and summarize techniques based on representative samples. Subsequently, the generated techniques (grimoire) are incorporated as part of the prompt to guide the weak language models in their responses.
  • Figure 2: Framework of proposed SleIcl method. First, multiple sets of representative samples are obtained using different sample selection methods (KCS, HCS, HSS, RSS), with each set sampled in a stratified manner based on labels. Subsequently, corresponding profound grimoires (PG) and simple grimoires (SG) are generated based on each sample set. Additionally, zero-shot-PG and zero-shot-SG, generated without samples, are included. Finally, all grimoires are ranked based on given test samples, and the optimal grimoire is handed over to the weak LLM for response.
  • Figure 3: Workflow for grimoire generation.
  • Figure 4: Radar Chart comparing GPT-4 results in zero-shot prompting with other models' results in Max(Single Grimoire) setting.
  • Figure 5: Architecture of the classifier. Within the three similar forward propagation modules following self-attention, the first two employ dropouts, while the final one employs normalization.