Table of Contents
Fetching ...

Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles

Lewei He, Tianyu Shi, Pengran Huang, Bingzhi Chen, Qianglong Chen, Jiahui Pan

TL;DR

The paper tackles the challenge of processing unlimited-length streaming transcripts by introducing Online Long-context Processing (OLP) and Role Reinforcement Learning (Role-RL) to orchestrate diverse LLMs in specialized roles. OLP decomposes streaming content into topics and aspects via six roles, while Role-RL assigns LLMs to roles using Q-learning guided by board judgments and a cost-aware reward. On the OLP-MINI dataset and LongBench benchmarks, the approach achieves an average recall of 0.932 and reduces LLM costs by 79.4%, with recall gains of up to 53.6 percentage points over non-OLP baselines. The work demonstrates scalable, real-time organization of long streaming content and offers a practical framework for dynamic, cost-effective multi-LLM orchestration in enterprise and media applications.

Abstract

Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity. To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos. Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays. In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance. Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%. The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL.

Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles

TL;DR

The paper tackles the challenge of processing unlimited-length streaming transcripts by introducing Online Long-context Processing (OLP) and Role Reinforcement Learning (Role-RL) to orchestrate diverse LLMs in specialized roles. OLP decomposes streaming content into topics and aspects via six roles, while Role-RL assigns LLMs to roles using Q-learning guided by board judgments and a cost-aware reward. On the OLP-MINI dataset and LongBench benchmarks, the approach achieves an average recall of 0.932 and reduces LLM costs by 79.4%, with recall gains of up to 53.6 percentage points over non-OLP baselines. The work demonstrates scalable, real-time organization of long streaming content and offers a practical framework for dynamic, cost-effective multi-LLM orchestration in enterprise and media applications.

Abstract

Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity. To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos. Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays. In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance. Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%. The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL.
Paper Structure (20 sections, 1 equation, 10 figures, 3 tables)

This paper contains 20 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of Online Long-context Processing (OLP) problem.
  • Figure 2: Architecture of the proposed Online Long-context Processing (OLP) pipeline in Role Reinforcement Learning (Role-RL) framework. The OLP pipeline consists of six well-defined roles that collaborate effectively to extract useful information from the context with unlimited length and restructure them into topics with supportive aspects. The Role-RL framework is composed of an LLM pool, an LLM advisory board, and a Role Manager driven by reinforcement learning to place the LLMs in different roles according to their actual performances and costs.
  • Figure 3: Illustration of Role-RL functionality.
  • Figure 4: Election of the board members.
  • Figure 5: Schematics of greedy-update (left) and cross-update (right) strategies.
  • ...and 5 more figures