Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Dong Chen; Shilin Zhang; Fei Gao; Yueting Zhuang; Siliang Tang; Qidong Liu; Mingliang Xu

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu

TL;DR

This work tackles the limitation that small LLMs struggle with complex decision-making and logical reasoning, by introducing Logic Distillation (LD), which decomposes tasks into discrete functions via L-LLMs to form a function base. A retriever selects relevant functions and a fine-tuned S-LLM executes them stage-by-stage, enabling function-by-function decision making; emergency rules can be incorporated as new functions, improving generalization. The approach is underpinned by an entropy-based argument favoring selection over generation and is validated through pursuit-game, emergency, and 21 Points experiments, showing LD can match or surpass L-LLMs with much smaller models and limited tuning. The results suggest a practical path to deploying capable, efficient agents on commodity hardware while preserving the ability to learn and extend game rules and tasks easily.

Abstract

Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (S-LLMs) that can be deployed on a variety of devices. Knowledge distillation (KD) aims to empower S-LLMs with the capabilities of L-LLMs, while S-LLMs merely mimic the outputs of L-LLMs, failing to get the powerful logical reasoning capabilities. Consequently, S-LLMs are helpless when it comes to planning and decision-making tasks that require logical reasoning capabilities. To tackle the identified challenges, we propose a novel framework called Logic Distillation (LD). Initially, LD employs L-LLMs to instantiate complex instructions into discrete functions and illustrates their usage to establish a function base. Subsequently, based on the function base, LD fine-tunes S-LLMs to learn the logic employed by L-LLMs in planning and decision-making. During testing, LD utilizes a retriever to identify the top-$K$ relevant functions based on instructions and current states, which will be selected and invoked by S-LLMs. Ultimately, S-LLMs yield planning and decision-making outcomes, function by function. Relevant experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in planning and decision-making tasks, comparable to, or even surpassing, those of L-LLMs.

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

TL;DR

Abstract

relevant functions based on instructions and current states, which will be selected and invoked by S-LLMs. Ultimately, S-LLMs yield planning and decision-making outcomes, function by function. Relevant experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in planning and decision-making tasks, comparable to, or even surpassing, those of L-LLMs.

Paper Structure (14 sections, 10 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 10 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Methodology
L-LLMs
Retriever
S-LLMs
Emergency handling of S-LLMs in LD
Why Selection Is Better
Experiments
Better Performance in Pursuit Game
Why LD is Better
Pursuit Game with Emergencies
Better Performance in 21 Ponits
Conclusion

Figures (7)

Figure 1: The outcome of one step in the pursuit game.
Figure 2: KD vs LD. KD aims to have smaller models mimic the output of larger models, while LD tries to enable smaller models to understand how larger models accomplish a task.
Figure 3: Illustration of the proposed Logic Distillation (LD). LD consists of three components: L-LLMs, retriever, and S-LLMs. L-LLMs are responsible for decomposing human-provided rules and instantiating them as basic functions to construct a function base. Besides, L-LLMs offer a user manual that explains the usage of these functions, including details such as rule descriptions and code comments. The retriever is in charge of retrieving the top-$K$ functions based on the insturctions and states. S-LLMs select the appropriate functions for different stages of the task. Subsequently, S-LLMs will systematically make decisions function by function.
Figure 4: Function base. The L-LLM decomposes the rules and instantiates the decision-making logic into multiple functions (each function performs a specific task, such as calculating distance, etc.). In addition, the L-LLM enables the S-LLM to accurately invoke relevant functions by creating a user manual (including explanations, function comments, corresponding invocation stages, etc.), thereby completing the decision-making process.
Figure 5: The global perspective of the pursuit game based on GLM4 and GLM4-9B. The trajectories of the three blue dots are represented by green, red, and grey dashed lines, respectively, with the darker colors indicating a higher number of passages.
...and 2 more figures

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

TL;DR

Abstract

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)