Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu; Yiyang Jia; Xiaohao Cai; Zhanxing Zhu

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

TL;DR

This work analyzes how RL and SFT enable chain-of-thought reasoning in a one-layer transformer to learn $k$-sparse Boolean functions via recursive 2-sparse decompositions. It derives sufficient conditions under which provable learning is achieved for both RL (with immediate rewards) and SFT (without teacher forcing), and validates these conditions on $k$-PARITY, $k$-AND, and $k$-OR. A key finding is that RL can learn the entire CoT chain in a single gradient update, while SFT learns the chain step-by-step, reflecting intrinsic differences in supervision signals. The results provide mechanistic insights into how CoT emerges under RL versus SFT and offer guidance for designing reasoning-based fine-tuning regimes in transformers.

Abstract

Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primary approaches to this end, yet their underlying mechanisms and differences remain theoretically unclear. In this work, we examine these aspects specifically for learning $k$-sparse Boolean functions with a one-layer transformer and intermediate supervision that is akin to CoT. In particular, we consider $k$-sparse Boolean functions that can be recursively decomposed into fixed 2-sparse Boolean functions. We analyze the learning dynamics of fine-tuning the transformer via either RL or SFT with CoT to identify sufficient conditions for it to provably learn these functions. We verify that these conditions hold for three basic examples, including $k$-PARITY, $k$-AND, and $k$-OR, thus demonstrating the learnability of both approaches. Notably, we reveal that RL and SFT exhibit distinct learning behaviors: RL learns the whole CoT chain simultaneously, whereas SFT learns the CoT chain step-by-step. Overall, our findings provide theoretical insights into the underlying mechanisms of RL and SFT as well as how they differ in triggering the CoT capabilities of transformers.

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

TL;DR

This work analyzes how RL and SFT enable chain-of-thought reasoning in a one-layer transformer to learn

-sparse Boolean functions via recursive 2-sparse decompositions. It derives sufficient conditions under which provable learning is achieved for both RL (with immediate rewards) and SFT (without teacher forcing), and validates these conditions on

-PARITY,

-AND, and

-OR. A key finding is that RL can learn the entire CoT chain in a single gradient update, while SFT learns the chain step-by-step, reflecting intrinsic differences in supervision signals. The results provide mechanistic insights into how CoT emerges under RL versus SFT and offer guidance for designing reasoning-based fine-tuning regimes in transformers.

Abstract

-sparse Boolean functions with a one-layer transformer and intermediate supervision that is akin to CoT. In particular, we consider

-sparse Boolean functions that can be recursively decomposed into fixed 2-sparse Boolean functions. We analyze the learning dynamics of fine-tuning the transformer via either RL or SFT with CoT to identify sufficient conditions for it to provably learn these functions. We verify that these conditions hold for three basic examples, including

-PARITY,

-AND, and

-OR, thus demonstrating the learnability of both approaches. Notably, we reveal that RL and SFT exhibit distinct learning behaviors: RL learns the whole CoT chain simultaneously, whereas SFT learns the CoT chain step-by-step. Overall, our findings provide theoretical insights into the underlying mechanisms of RL and SFT as well as how they differ in triggering the CoT capabilities of transformers.

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

TL;DR

Abstract

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)