Table of Contents
Fetching ...

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

Pengcheng Jiang, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, Jiawei Han

TL;DR

TriSum is introduced, a framework for distilling LLMs’ text summarization abilities into a compact, local model that enhances local model performance on various benchmarks and improves interpretability by providing insights into the summarization rationale.

Abstract

The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

TL;DR

TriSum is introduced, a framework for distilling LLMs’ text summarization abilities into a compact, local model that enhances local model performance on various benchmarks and improves interpretability by providing insights into the summarization rationale.

Abstract

The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.
Paper Structure (41 sections, 9 equations, 10 figures, 8 tables)

This paper contains 41 sections, 9 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: A conceptual demonstration of our three-step framework TriSum that endows local small models with LLM's text summarization capability.
  • Figure 2: Distilling text summarization ability from LLM to local model using TriSum. Step 1. LLM Rationale Probing: Employing a template-based prompt incorporating the given document and ground-truth summary, we engage an LLM to generate a set of $n$ step-by-step rationales across $n$ iterations. Step 2. Golden Rationale Selection: We leverage summary and coherency scores to meticulously choose high-quality training rationales, enhancing the training dataset. Step 3. Curriculum Learning: We implement a curriculum learning strategy to train our compact small model with rationalized summarization ability from easy to challenging tasks.
  • Figure 3: An example of abstractive summarization on CNN/DailyMail dataset. We compare the summary generated by our TriSum approach to the ground-truth summary and the one generated by BART. We use different colors to show the distinct topics in the article and summary.
  • Figure 4: Validation loss by training steps and ablation study for curriculum learning on CNN/DailyMail.AspExt, TriExt, and SumGen denote aspect extraction, triple extraction, and summary generation tasks, respectively. -early/-late denote the early/late stage of concurrent learning. -raw denotes training the model from scratch.
  • Figure 5: Performance by different numbers of LDA latent topics specified in golden rationale selection. We compare the ROUGE scores of the summaries generated by TriSum-R on CNN/DailyMail dataset.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1: Aspect
  • Definition 2: Triple