Table of Contents
Fetching ...

Continuous Prompts: LLM-Augmented Pipeline Processing over Unstructured Streams

Shu Chen, Deepti Raghavan, Uğur Çetintemel

TL;DR

Continuous Prompts introduce a stateful, adaptive framework for LLM-driven analytics over unstructured streams, expanding Retrieval-Augmented Generation to continuous processing. The approach defines streaming-native semantic operators (e.g., semantic windows, semantic group-by, continuous RAG) and two LLM-specific optimizations (tuple batching and operator fusion) whose accuracy-throughput trade-offs are managed by a dynamic planning framework. A cost-aware multi-objective Bayesian optimization (MOBO) engine learns throughput-accuracy frontiers under probing budgets and guides runtime plan selection. Implemented in VectraFlow, CPs demonstrate persistent, workload-adaptive semantic queries on real streaming pipelines, effectively navigating accuracy-efficiency trade-offs as data streams evolve.

Abstract

Monitoring unstructured streams increasingly requires persistent, semantics-aware computation, yet today's LLM frameworks remain stateless and one-shot, limiting their usefulness for long-running analytics. We introduce Continuous Prompts (CPs), the first framework that brings LLM reasoning into continuous stream processing. CPs extend RAG to streaming settings, define continuous semantic operators, and provide multiple implementations, primarily focusing on LLM-based approaches but also reporting one embedding-based variants. Furthermore, we study two LLM-centric optimizations, tuple batching and operator fusion, to significantly improve efficiency while managing accuracy loss. Because these optimizations inherently trade accuracy for speed, we present a dynamic optimization framework that uses lightweight shadow executions and cost-aware multi-objective Bayesian optimization (MOBO) to learn throughput-accuracy frontiers and adapt plans under probing budgets. We implement CPs in the VectraFlow stream processing system. Using operator-level microbenchmarks and streaming pipelines on real datasets, we show that VectraFlow can adapt to workload dynamics, navigate accuracy-efficiency trade-offs, and sustain persistent semantic queries over evolving unstructured streams.

Continuous Prompts: LLM-Augmented Pipeline Processing over Unstructured Streams

TL;DR

Continuous Prompts introduce a stateful, adaptive framework for LLM-driven analytics over unstructured streams, expanding Retrieval-Augmented Generation to continuous processing. The approach defines streaming-native semantic operators (e.g., semantic windows, semantic group-by, continuous RAG) and two LLM-specific optimizations (tuple batching and operator fusion) whose accuracy-throughput trade-offs are managed by a dynamic planning framework. A cost-aware multi-objective Bayesian optimization (MOBO) engine learns throughput-accuracy frontiers under probing budgets and guides runtime plan selection. Implemented in VectraFlow, CPs demonstrate persistent, workload-adaptive semantic queries on real streaming pipelines, effectively navigating accuracy-efficiency trade-offs as data streams evolve.

Abstract

Monitoring unstructured streams increasingly requires persistent, semantics-aware computation, yet today's LLM frameworks remain stateless and one-shot, limiting their usefulness for long-running analytics. We introduce Continuous Prompts (CPs), the first framework that brings LLM reasoning into continuous stream processing. CPs extend RAG to streaming settings, define continuous semantic operators, and provide multiple implementations, primarily focusing on LLM-based approaches but also reporting one embedding-based variants. Furthermore, we study two LLM-centric optimizations, tuple batching and operator fusion, to significantly improve efficiency while managing accuracy loss. Because these optimizations inherently trade accuracy for speed, we present a dynamic optimization framework that uses lightweight shadow executions and cost-aware multi-objective Bayesian optimization (MOBO) to learn throughput-accuracy frontiers and adapt plans under probing budgets. We implement CPs in the VectraFlow stream processing system. Using operator-level microbenchmarks and streaming pipelines on real datasets, we show that VectraFlow can adapt to workload dynamics, navigate accuracy-efficiency trade-offs, and sustain persistent semantic queries over evolving unstructured streams.

Paper Structure

This paper contains 27 sections, 11 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Semantic window implementations on the MiDe22 dataset. Left: metric comparison (F1, ARI, Boundary F1, and Purity), with $\star$ indicating the best score. Right: throughput in tuples/s.
  • Figure 2: Semantic group-by implementations on the MiDe22 dataset. Left: metric comparison (F1, ARI, Boundary F1, and Purity), with $\star$ indicating the best score. Right: throughput in tuples/s.
  • Figure 3: Traditional RAG vs. Continuous RAG.
  • Figure 4: Continuous RAG implementations on the MiDe22 dataset
  • Figure 5: Continuous RAG under varying predicate counts (2–10). Left: F1 versus # predicates. Right: throughput (rec/s) versus # predicates.
  • ...and 10 more figures