Exploring System 1 and 2 communication for latent reasoning in LLMs

Julian Coda-Forno; Zhuokai Zhao; Qiang Zhang; Dipesh Tamboli; Weiwei Li; Xiangjun Fan; Lizhu Zhang; Eric Schulz; Hsiao-Ping Tseng

Exploring System 1 and 2 communication for latent reasoning in LLMs

Julian Coda-Forno, Zhuokai Zhao, Qiang Zhang, Dipesh Tamboli, Weiwei Li, Xiangjun Fan, Lizhu Zhang, Eric Schulz, Hsiao-Ping Tseng

TL;DR

The paper investigates whether latent reasoning in LLMs should reside in a separate Coprocessor or within a single model's forward pass. It evaluates two communication-focused variants of a KV-cache Coprocessor and compares them to a unified soft-embedding baseline under matched latent budgets, across reasoning and pretraining tasks. Findings show that co-finetuning yields the strongest gains among dual designs, but a parameter-matched single model with soft latent prompts often matches or surpasses dual setups, indicating added compute rather than qualitative reasoning advantages. Explicit latent-space objectives, such as orthogonality regularization, can restore specialized latent roles and improve combinatorial reasoning, though they may trade off general language modeling performance, highlighting directions for future curriculum and objective design to enact System-2-like latent reasoning.

Abstract

Should LLM reasoning live in a separate module, or within a single model's forward pass and representational space? We study dual-architecture latent reasoning, where a fluent Base exchanges latent messages with a Coprocessor, and test two hypotheses aimed at improving latent communication over Liu et al. (2024): (H1) increase channel capacity; (H2) learn communication via joint finetuning. Under matched latent-token budgets on GPT-2 and Qwen-3, H2 is consistently strongest while H1 yields modest gains. A unified soft-embedding baseline, a single model with the same forward pass and shared representations, using the same latent-token budget, nearly matches H2 and surpasses H1, suggesting current dual designs mostly add compute rather than qualitatively improving reasoning. Across GSM8K, ProsQA, and a Countdown stress test with increasing branching factor, scaling the latent-token budget beyond small values fails to improve robustness. Latent analyses show overlapping subspaces with limited specialization, consistent with weak reasoning gains. We conclude dual-model latent reasoning remains promising in principle, but likely requires objectives and training schedules that explicitly shape latent spaces for algorithmic planning.

Exploring System 1 and 2 communication for latent reasoning in LLMs

TL;DR

Abstract

Exploring System 1 and 2 communication for latent reasoning in LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)