Table of Contents
Fetching ...

Look Ahead Text Understanding and LLM Stitching

Junlin Julian Jiang, Xin Li

TL;DR

This work defines look ahead text understanding through LASI, a task requiring prediction of the upcoming sentence label using preceding content, and shows that LASI is more challenging than classic section identification. It argues that combining bidirectional context models (BERT) with autoregressive models (GPT) via lightweight stitching can effectively predict future content, particularly in noisy, developing text common in social media and AI-generated dialogue. The authors introduce Loss Stitching and Attention Stitching to align representation spaces and signals between GPT and BERT, with experiments on the PUBMED-RCT corpus demonstrating improvements over standard baselines, especially under noise. The framework offers a practical, scalable way to leverage multiple pre-trained LLMs for look ahead tasks and has broad implications for social media analysis, long-form generation, and AI-assisted writing, while highlighting ethical considerations and limitations for future work.

Abstract

This paper proposes a look ahead text understanding problem with look ahead section identification (LASI) as an example. This problem may appear in generative AI as well as human interactions, where we want to understand the direction of a developing text or conversation. We tackle the problem using transformer-based LLMs. We show that LASI is more challenging than classic section identification (SI). We argue that both bidirectional contextual information (e.g., BERT) and unidirectional predictive ability (e.g., GPT) will benefit the task. We propose two approaches to stitch together BERT and GPT. Experiments show that our approach outperforms the established models, especially when there is noise in the text (which is often the case for developing text in generative AI). Our paper sheds light on other look ahead text understanding tasks that are important to social media, such as look ahead sentiment classification, and points out the opportunities to leverage pre-trained LLMs through stitching.

Look Ahead Text Understanding and LLM Stitching

TL;DR

This work defines look ahead text understanding through LASI, a task requiring prediction of the upcoming sentence label using preceding content, and shows that LASI is more challenging than classic section identification. It argues that combining bidirectional context models (BERT) with autoregressive models (GPT) via lightweight stitching can effectively predict future content, particularly in noisy, developing text common in social media and AI-generated dialogue. The authors introduce Loss Stitching and Attention Stitching to align representation spaces and signals between GPT and BERT, with experiments on the PUBMED-RCT corpus demonstrating improvements over standard baselines, especially under noise. The framework offers a practical, scalable way to leverage multiple pre-trained LLMs for look ahead tasks and has broad implications for social media analysis, long-form generation, and AI-assisted writing, while highlighting ethical considerations and limitations for future work.

Abstract

This paper proposes a look ahead text understanding problem with look ahead section identification (LASI) as an example. This problem may appear in generative AI as well as human interactions, where we want to understand the direction of a developing text or conversation. We tackle the problem using transformer-based LLMs. We show that LASI is more challenging than classic section identification (SI). We argue that both bidirectional contextual information (e.g., BERT) and unidirectional predictive ability (e.g., GPT) will benefit the task. We propose two approaches to stitch together BERT and GPT. Experiments show that our approach outperforms the established models, especially when there is noise in the text (which is often the case for developing text in generative AI). Our paper sheds light on other look ahead text understanding tasks that are important to social media, such as look ahead sentiment classification, and points out the opportunities to leverage pre-trained LLMs through stitching.

Paper Structure

This paper contains 31 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Look Ahead Section Identification
  • Figure 2: Overview of the Study
  • Figure 3: BERT and GPT Representation
  • Figure 4: Loss Stitching
  • Figure 5: Attention Stitching
  • ...and 1 more figures