Table of Contents
Fetching ...

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

Yuxi Xie, Anirudh Goyal, Xiaobao Wu, Xunjian Yin, Xiao Xu, Min-Yen Kan, Liangming Pan, William Yang Wang

TL;DR

Context-Wise Order-Agnostic Language Modeling (COrAL) is proposed, which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency and introduces sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows.

Abstract

Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of $4.6\%$ on GSM8K and $4.0\%$ on LogiQA, along with inference speedups of up to $3.9\times$ over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality--speed trade-off. Our code is publicly available at https://github.com/YuxiXie/COrAL.

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

TL;DR

Context-Wise Order-Agnostic Language Modeling (COrAL) is proposed, which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency and introduces sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows.

Abstract

Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of on GSM8K and on LogiQA, along with inference speedups of up to over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality--speed trade-off. Our code is publicly available at https://github.com/YuxiXie/COrAL.

Paper Structure

This paper contains 38 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Scaling of performance and inference cost on GSM8K with increasing the minimum refinement times for each output position. $k$ represents the backward context window size. We set the decoding block size as $b=64$.
  • Figure 2: Sliding Blockwise Order-Agnostic Decoding. COrAL performs multi-token prediction and refinement in the sliding block with context window size $k=\!3$ and block size $b\!=\!6$.
  • Figure 3: Context-Wise Order-Agnostic Language Modeling. We visualize the order-agnostic dependencies within a context window size $k=2$. For target-aware position encoding, we show how COrAL obtains query representations for multiple positions within a context window size $k=2$.
  • Figure 4: Result comparison of pass rates and speed on code generation.
  • Figure 5: Meso-analysis of error cases in code generation (Ours $_\textrm{w/o verifier}$) on HumanEval. The primary failure cases come from syntax errors.
  • ...and 5 more figures