Table of Contents
Fetching ...

Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5

Marcel Lamott, Muhammad Armaghan Shakir

TL;DR

This work presents a novel approach wherein it distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5, offering a scalable solution that bridges the gap between resource-intensive LLMs and practical applications.

Abstract

The surge of digital documents in various formats, including less standardized documents such as business reports and environmental assessments, underscores the growing importance of Document Understanding. While Large Language Models (LLMs) have showcased prowess across diverse natural language processing tasks, their direct application to Document Understanding remains a challenge. Previous research has demonstrated the utility of LLMs in this domain, yet their significant computational demands make them challenging to deploy effectively. Additionally, proprietary Blackbox LLMs often outperform their open-source counterparts, posing a barrier to widespread accessibility. In this paper, we delve into the realm of document understanding, leveraging distillation methods to harness the power of large LLMs while accommodating computational limitations. Specifically, we present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5. Our methodology integrates labeling and curriculum-learning mechanisms to facilitate efficient knowledge transfer. This work contributes to the advancement of document understanding methodologies by offering a scalable solution that bridges the gap between resource-intensive LLMs and practical applications. Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios, thereby fostering advancements in natural language processing and document comprehension domains.

Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5

TL;DR

This work presents a novel approach wherein it distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5, offering a scalable solution that bridges the gap between resource-intensive LLMs and practical applications.

Abstract

The surge of digital documents in various formats, including less standardized documents such as business reports and environmental assessments, underscores the growing importance of Document Understanding. While Large Language Models (LLMs) have showcased prowess across diverse natural language processing tasks, their direct application to Document Understanding remains a challenge. Previous research has demonstrated the utility of LLMs in this domain, yet their significant computational demands make them challenging to deploy effectively. Additionally, proprietary Blackbox LLMs often outperform their open-source counterparts, posing a barrier to widespread accessibility. In this paper, we delve into the realm of document understanding, leveraging distillation methods to harness the power of large LLMs while accommodating computational limitations. Specifically, we present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5. Our methodology integrates labeling and curriculum-learning mechanisms to facilitate efficient knowledge transfer. This work contributes to the advancement of document understanding methodologies by offering a scalable solution that bridges the gap between resource-intensive LLMs and practical applications. Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios, thereby fostering advancements in natural language processing and document comprehension domains.
Paper Structure (9 sections, 1 equation, 2 figures, 3 tables)

This paper contains 9 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the knowledge elicitation: the documents from the base datasets are converted to a textual representation with LAPDoc SpatialFormat verbalizer and inserted into task specific prompt templates. The prompts are used as input for the distillation dataset and to generate the training labels by feeding them into the teacher LLM: ChatGPT 3.5.
  • Figure 2: Overview of the curriculum learning: the student is trained for one epoch, after which its predictions are exported and used to generated the datasets for subsequent epochs. These sampled datasets present the data in order of increasing difficulty based on the previous predictions of the student.