Table of Contents
Fetching ...

OkraLong: A Flexible Retrieval-Augmented Framework for Long-Text Query Processing

Yulong Hui, Yihao Liu, Yao Lu, Huanchen Zhang

TL;DR

OkraLong tackles efficient long-text query processing for LLMs by moving beyond static long-context and traditional RAG pipelines. It introduces a fine-grained orchestration framework with an Analyzer, Organizer, and Executor that characterize tasks, schedule adaptive workflows, and execute diverse operator pipelines. Across six long-text datasets, OkraLong improves answer accuracy while reducing cost and latency relative to baselines, with a Precise-Mode capable of matching long-context accuracy at substantial cost savings. The work provides a practical, scalable solution for enterprise document analysis and complex long-form reasoning.

Abstract

Large Language Models (LLMs) encounter challenges in efficiently processing long-text queries, as seen in applications like enterprise document analysis and financial report comprehension. While conventional solutions employ long-context processing or Retrieval-Augmented Generation (RAG), they suffer from prohibitive input expenses or incomplete information. Recent advancements adopt context compression and dynamic retrieval loops, but still sacrifice critical details or incur iterative costs. To address these limitations, we propose OkraLong, a novel framework that flexibly optimizes the entire processing workflow. Unlike prior static or coarse-grained adaptive strategies, OkraLong adopts fine-grained orchestration through three synergistic components: analyzer, organizer and executor. The analyzer characterizes the task states, which guide the organizer in dynamically scheduling the workflow. The executor carries out the execution and generates the final answer. Experimental results demonstrate that OkraLong not only enhances answer accuracy but also achieves cost-effectiveness across a variety of datasets.

OkraLong: A Flexible Retrieval-Augmented Framework for Long-Text Query Processing

TL;DR

OkraLong tackles efficient long-text query processing for LLMs by moving beyond static long-context and traditional RAG pipelines. It introduces a fine-grained orchestration framework with an Analyzer, Organizer, and Executor that characterize tasks, schedule adaptive workflows, and execute diverse operator pipelines. Across six long-text datasets, OkraLong improves answer accuracy while reducing cost and latency relative to baselines, with a Precise-Mode capable of matching long-context accuracy at substantial cost savings. The work provides a practical, scalable solution for enterprise document analysis and complex long-form reasoning.

Abstract

Large Language Models (LLMs) encounter challenges in efficiently processing long-text queries, as seen in applications like enterprise document analysis and financial report comprehension. While conventional solutions employ long-context processing or Retrieval-Augmented Generation (RAG), they suffer from prohibitive input expenses or incomplete information. Recent advancements adopt context compression and dynamic retrieval loops, but still sacrifice critical details or incur iterative costs. To address these limitations, we propose OkraLong, a novel framework that flexibly optimizes the entire processing workflow. Unlike prior static or coarse-grained adaptive strategies, OkraLong adopts fine-grained orchestration through three synergistic components: analyzer, organizer and executor. The analyzer characterizes the task states, which guide the organizer in dynamically scheduling the workflow. The executor carries out the execution and generates the final answer. Experimental results demonstrate that OkraLong not only enhances answer accuracy but also achieves cost-effectiveness across a variety of datasets.

Paper Structure

This paper contains 29 sections, 2 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Comparison of OkraLong with two prevalent advanced paradigms for processing long-text queries.
  • Figure 2: Architecture of OkraLong illustrated with an example. After the query-based primary retrieval, the analyzer assesses the current task scenarios. Utilizing the analysis results, the organizer dynamically provides execution plans and configurations, covering various potential workflows. Then the executor carries out the plan and generate the final answer.
  • Figure 3: Average performance of end-to-end query answering across six datasets. Superior approaches are left and top positioned, indicating lower cost and higher accuracy. And the execution time is represented by the colors (the dark color denotes reduced latency).
  • Figure 4: Average end-to-end latency results across various methods. The execution time (per query) comprises context processing time and LLM generation time.