Table of Contents
Fetching ...

PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks

Vishal Pallagani, Biplav Srivastava, Nitin Gupta

TL;DR

This work introduces planning-like task summarization as a novel problem and provides the PLANTS dataset across automated plans, recipes, and travel routes to foster study of concise, executable summaries. It compares abstractive (GPT-4o) and extractive baselines (TextRank and a frequency-based method) through a human-centered evaluation, finding that GPT-4o delivers the most information-dense summaries and is preferred by users, while highlighting concerns about executional semantics and hallucinations. The contributions include a formal definition of plan summarization, a tailored dataset, and a baseline approach, along with initial user studies and insights on evaluation metrics. The results suggest significant potential for PL-task summarization in domains like robotics and dialog systems, while underscoring the need for robust, task-specific evaluation tools and broader testing.

Abstract

Text summarization is a well-studied problem that deals with deriving insights from unstructured text consumed by humans, and it has found extensive business applications. However, many real-life tasks involve generating a series of actions to achieve specific goals, such as workflows, recipes, dialogs, and travel plans. We refer to them as planning-like (PL) tasks noting that the main commonality they share is control flow information. which may be partially specified. Their structure presents an opportunity to create more practical summaries to help users make quick decisions. We investigate this observation by introducing a novel plan summarization problem, presenting a dataset, and providing a baseline method for generating PL summaries. Using quantitative metrics and qualitative user studies to establish baselines, we evaluate the plan summaries from our method and large language models. We believe the novel problem and dataset can reinvigorate research in summarization, which some consider as a solved problem.

PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks

TL;DR

This work introduces planning-like task summarization as a novel problem and provides the PLANTS dataset across automated plans, recipes, and travel routes to foster study of concise, executable summaries. It compares abstractive (GPT-4o) and extractive baselines (TextRank and a frequency-based method) through a human-centered evaluation, finding that GPT-4o delivers the most information-dense summaries and is preferred by users, while highlighting concerns about executional semantics and hallucinations. The contributions include a formal definition of plan summarization, a tailored dataset, and a baseline approach, along with initial user studies and insights on evaluation metrics. The results suggest significant potential for PL-task summarization in domains like robotics and dialog systems, while underscoring the need for robust, task-specific evaluation tools and broader testing.

Abstract

Text summarization is a well-studied problem that deals with deriving insights from unstructured text consumed by humans, and it has found extensive business applications. However, many real-life tasks involve generating a series of actions to achieve specific goals, such as workflows, recipes, dialogs, and travel plans. We refer to them as planning-like (PL) tasks noting that the main commonality they share is control flow information. which may be partially specified. Their structure presents an opportunity to create more practical summaries to help users make quick decisions. We investigate this observation by introducing a novel plan summarization problem, presenting a dataset, and providing a baseline method for generating PL summaries. Using quantitative metrics and qualitative user studies to establish baselines, we evaluate the plan summaries from our method and large language models. We believe the novel problem and dataset can reinvigorate research in summarization, which some consider as a solved problem.
Paper Structure (11 sections, 1 equation, 4 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 1 equation, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Google Maps summarizes three possible driving routes from Manhattan to Pleasantville, New York. The initial view (Box 1) includes key information like critical roads, estimated travel time, and distance, aiding quick decision-making. Detailed step-by-step directions can be accessed by expanding each summary present in Box 2.
  • Figure 2: Distribution of problems and plans across domains. Left: shows the number of problems per domain, with each domain having 10 problems. Right: displays the average number of plans per problem for each domain.
  • Figure 3: Comparison of token counts across different summarization approaches.
  • Figure 4: Comparison of lexical diversity across different summarization approaches to understand their information-richness.