Table of Contents
Fetching ...

Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs

Reem Gody, Mohamed Abdelghaffar, Mohammed Jabreel, Ahmed Tawfik

TL;DR

The paper tackles the high cost of processing long, multi-party conversations with large language models by introducing an intent-based filtering layer. It trains a lightweight classifier (MobileBERT) via knowledge distillation from LLM-derived annotations to identify action-triggering and information-seeking snippets, enabling the system to forward only relevant segments to the LLM. The authors develop a data-centric pipeline that combines diverse real and synthetic data, LLM-based labeling with explanations, online augmentation, and careful fine-tuning, achieving high $F1$ scores while substantially reducing token usage. The results demonstrate meaningful cost savings and practical applicability for edge deployments and scalable conversational AI workflows, with potential for extending to additional intents and domains.

Abstract

Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the target intents and is then used to fine-tune the MobileBERT model for multi-label intent classification. This model achieves a balance between efficiency and performance, effectively filtering conversation snippets based on their intents. By passing only the relevant snippets to the LLM for further processing, our approach significantly reduces overall operational costs depending on the intents and the data distribution as demonstrated in our experiments.

Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs

TL;DR

The paper tackles the high cost of processing long, multi-party conversations with large language models by introducing an intent-based filtering layer. It trains a lightweight classifier (MobileBERT) via knowledge distillation from LLM-derived annotations to identify action-triggering and information-seeking snippets, enabling the system to forward only relevant segments to the LLM. The authors develop a data-centric pipeline that combines diverse real and synthetic data, LLM-based labeling with explanations, online augmentation, and careful fine-tuning, achieving high scores while substantially reducing token usage. The results demonstrate meaningful cost savings and practical applicability for edge deployments and scalable conversational AI workflows, with potential for extending to additional intents and domains.

Abstract

Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the target intents and is then used to fine-tune the MobileBERT model for multi-label intent classification. This model achieves a balance between efficiency and performance, effectively filtering conversation snippets based on their intents. By passing only the relevant snippets to the LLM for further processing, our approach significantly reduces overall operational costs depending on the intents and the data distribution as demonstrated in our experiments.

Paper Structure

This paper contains 14 sections, 2 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: An intent-based filtering model is used to filter irrelevant conversation snippets.