Logic Distillation: Learning from Code Function by Function for Decision-making Tasks
Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu
TL;DR
This work tackles the limitation that small LLMs struggle with complex decision-making and logical reasoning, by introducing Logic Distillation (LD), which decomposes tasks into discrete functions via L-LLMs to form a function base. A retriever selects relevant functions and a fine-tuned S-LLM executes them stage-by-stage, enabling function-by-function decision making; emergency rules can be incorporated as new functions, improving generalization. The approach is underpinned by an entropy-based argument favoring selection over generation and is validated through pursuit-game, emergency, and 21 Points experiments, showing LD can match or surpass L-LLMs with much smaller models and limited tuning. The results suggest a practical path to deploying capable, efficient agents on commodity hardware while preserving the ability to learn and extend game rules and tasks easily.
Abstract
Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (S-LLMs) that can be deployed on a variety of devices. Knowledge distillation (KD) aims to empower S-LLMs with the capabilities of L-LLMs, while S-LLMs merely mimic the outputs of L-LLMs, failing to get the powerful logical reasoning capabilities. Consequently, S-LLMs are helpless when it comes to planning and decision-making tasks that require logical reasoning capabilities. To tackle the identified challenges, we propose a novel framework called Logic Distillation (LD). Initially, LD employs L-LLMs to instantiate complex instructions into discrete functions and illustrates their usage to establish a function base. Subsequently, based on the function base, LD fine-tunes S-LLMs to learn the logic employed by L-LLMs in planning and decision-making. During testing, LD utilizes a retriever to identify the top-$K$ relevant functions based on instructions and current states, which will be selected and invoked by S-LLMs. Ultimately, S-LLMs yield planning and decision-making outcomes, function by function. Relevant experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in planning and decision-making tasks, comparable to, or even surpassing, those of L-LLMs.
