Table of Contents
Fetching ...

Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?

Kaidong Feng, Zhu Sun, Jie Yang, Hui Fang, Xinghua Qu, Wenyuan Liu

TL;DR

The paper addresses the computational burden of large LLMs in bundle generation and proposes a comprehensive knowledge distillation framework to create efficient student models. It systematically studies how the format, quantity, and utilization of distilled knowledge affect performance, using three real-world bundle datasets. Through extensive experiments, the authors show that KD—especially when combining SFT and ICL with diverse knowledge formats—can achieve precision and coverage comparable to or better than larger models while significantly reducing resource requirements. The work highlights practical implications for deploying efficient LLM-based bundle generation and points to future work on implicit KD and multi-modal data integration.

Abstract

LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during fine-tuning and inference due to their massive parameterization. Knowledge distillation (KD) offers a promising solution, transferring expertise from large teacher models to compact student models. This study systematically investigates knowledge distillation approaches for bundle generation, aiming to minimize computational demands while preserving performance. We explore three critical research questions: (1) how does the format of KD impact bundle generation performance? (2) to what extent does the quantity of distilled knowledge influence performance? and (3) how do different ways of utilizing the distilled knowledge affect performance? We propose a comprehensive KD framework that (i) progressively extracts knowledge (patterns, rules, deep thoughts); (ii) captures varying quantities of distilled knowledge through different strategies; and (iii) exploits complementary LLM adaptation techniques (in-context learning, supervised fine-tuning, combination) to leverage distilled knowledge in small student models for domain-specific adaptation and enhanced efficiency. Extensive experiments provide valuable insights into how knowledge format, quantity, and utilization methodologies collectively shape LLM-based bundle generation performance, exhibiting KD's significant potential for more efficient yet effective LLM-based bundle generation.

Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?

TL;DR

The paper addresses the computational burden of large LLMs in bundle generation and proposes a comprehensive knowledge distillation framework to create efficient student models. It systematically studies how the format, quantity, and utilization of distilled knowledge affect performance, using three real-world bundle datasets. Through extensive experiments, the authors show that KD—especially when combining SFT and ICL with diverse knowledge formats—can achieve precision and coverage comparable to or better than larger models while significantly reducing resource requirements. The work highlights practical implications for deploying efficient LLM-based bundle generation and points to future work on implicit KD and multi-modal data integration.

Abstract

LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during fine-tuning and inference due to their massive parameterization. Knowledge distillation (KD) offers a promising solution, transferring expertise from large teacher models to compact student models. This study systematically investigates knowledge distillation approaches for bundle generation, aiming to minimize computational demands while preserving performance. We explore three critical research questions: (1) how does the format of KD impact bundle generation performance? (2) to what extent does the quantity of distilled knowledge influence performance? and (3) how do different ways of utilizing the distilled knowledge affect performance? We propose a comprehensive KD framework that (i) progressively extracts knowledge (patterns, rules, deep thoughts); (ii) captures varying quantities of distilled knowledge through different strategies; and (iii) exploits complementary LLM adaptation techniques (in-context learning, supervised fine-tuning, combination) to leverage distilled knowledge in small student models for domain-specific adaptation and enhanced efficiency. Extensive experiments provide valuable insights into how knowledge format, quantity, and utilization methodologies collectively shape LLM-based bundle generation performance, exhibiting KD's significant potential for more efficient yet effective LLM-based bundle generation.

Paper Structure

This paper contains 30 sections, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Example bundles for (1) a camera and its accessories; and (2) mystery, thriller, and historical fiction sun2024adaptive.
  • Figure 2: The overview of our proposed knowledge distillation framework.
  • Figure 3: Variation in knowledge quantity as sampling ratio increases (Pattern: 1st Column; Rule: 2nd Column; Thought: 3rd Column).
  • Figure 4: Performance comparison of Llama3.1-ICL under different sampling strategies and ratios in the Electronic domain.
  • Figure 5: Performance comparison of Llama3.1-ICL under different sampling strategies and ratios in the Clothing domain.
  • ...and 8 more figures