Table of Contents
Fetching ...

Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

Peter Baile Chen, Yi Zhang, Dan Roth, Samuel Madden, Jacob Andreas, Michael Cafarella

TL;DR

Log-augmented generation (LAG) enables large language models to reuse past reasoning by storing it as key-value logs and retrieving relevant entries at inference to augment generation. It uses a three-part pipeline of encoding/storing reasoning traces, retrieving relevant KV logs, and generating with augmented context, with careful separation of encoding and storage. Empirical results across knowledge- and reasoning-intensive tasks show that LAG, particularly the KV-based variant, surpasses standard agentic baselines, reflection, and traditional KV caches while also reducing the number of reasoning iterations. The work demonstrates a scalable approach to improve accuracy and efficiency in multi-step reasoning tasks.

Abstract

While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts struggle to retain reasoning from previous tasks and apply them in future contexts. To address this limitation, we propose a novel framework, log-augmented generation (LAG) that directly reuses prior computation and reasoning from past logs at test time to enhance model's ability to learn from previous tasks and perform better on new, unseen challenges, all while keeping the system efficient and scalable. Specifically, our system represents task logs using key-value (KV) caches, encoding the full reasoning context of prior tasks while storing KV caches for only a selected subset of tokens. When a new task arises, LAG retrieves the KV values from relevant logs to augment generation. Our approach differs from reflection-based memory mechanisms by directly reusing prior reasoning and computations without requiring additional steps for knowledge extraction or distillation. Our method also goes beyond existing KV caching techniques, which primarily target efficiency gains rather than improving accuracy. Experiments on knowledge- and reasoning-intensive datasets demonstrate that our method significantly outperforms standard agentic systems that do not utilize logs, as well as existing solutions based on reflection and KV cache techniques.

Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

TL;DR

Log-augmented generation (LAG) enables large language models to reuse past reasoning by storing it as key-value logs and retrieving relevant entries at inference to augment generation. It uses a three-part pipeline of encoding/storing reasoning traces, retrieving relevant KV logs, and generating with augmented context, with careful separation of encoding and storage. Empirical results across knowledge- and reasoning-intensive tasks show that LAG, particularly the KV-based variant, surpasses standard agentic baselines, reflection, and traditional KV caches while also reducing the number of reasoning iterations. The work demonstrates a scalable approach to improve accuracy and efficiency in multi-step reasoning tasks.

Abstract

While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts struggle to retain reasoning from previous tasks and apply them in future contexts. To address this limitation, we propose a novel framework, log-augmented generation (LAG) that directly reuses prior computation and reasoning from past logs at test time to enhance model's ability to learn from previous tasks and perform better on new, unseen challenges, all while keeping the system efficient and scalable. Specifically, our system represents task logs using key-value (KV) caches, encoding the full reasoning context of prior tasks while storing KV caches for only a selected subset of tokens. When a new task arises, LAG retrieves the KV values from relevant logs to augment generation. Our approach differs from reflection-based memory mechanisms by directly reusing prior reasoning and computations without requiring additional steps for knowledge extraction or distillation. Our method also goes beyond existing KV caching techniques, which primarily target efficiency gains rather than improving accuracy. Experiments on knowledge- and reasoning-intensive datasets demonstrate that our method significantly outperforms standard agentic systems that do not utilize logs, as well as existing solutions based on reflection and KV cache techniques.

Paper Structure

This paper contains 28 sections, 3 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: While humans naturally possess the ability to learn from past experiences, LLMs lack this capability, resulting in redundant reasoning. Our log-augmented generation framework allows LLMs to directly reuse prior thought/ reasoning processes. Algorithm \ref{['alg:log']} includes details of LAG.
  • Figure 2: A typical attention weight, after applying the lower-triangular causal mask, enables each token’s KV values to attend to all preceding context with different levels of importance. Leveraging this property, LAG selectively stores the KV values of a subset of tokens from the model’s most recent response, while still preserving the complete reasoning context. This approach differs from existing KV caching methods, which do not differentiate between content for encoding and storage.
  • Figure 3: Exact match performance of standard agentic systems and LAG varying the maximum number of reasoning steps allowed to solve a task. Logs show efficiency benefits when the performance of LAG in on par with agentic systems but requires fewer reasoning steps. They show performance benefits when LAG surpasses the agentic systems at points where their performance has plateaued.