Table of Contents
Fetching ...

Doppelgänger's Watch: A Split Objective Approach to Large Language Models

Shervin Ghasemlou, Ashish Katiyar, Aparajita Saraf, Seungwhan Moon, Mangesh Pujari, Pinar Donmez, Babak Damavandi, Anuj Kumar

TL;DR

This work addresses balancing multiple supervisory signals (e.g., factual correctness, sentiment, bias) with the core helpfulness of large language models. It introduces a bicameral Transformer where a parallel Doppelgänger supervises token-by-token generation while the language component remains frozen, preserving pretrained capabilities and enabling objective-agnostic supervision. The authors formalize the approach with an extended language function $ ext{L}_{ heta_1, abla, heta_n}$ and monotonic composite rewards $ ext{CR}$, proving a Split Objective Supremacy lemma that independently optimized objectives can achieve equal or higher composite rewards than a single-objective one. The paper argues for practical benefits, including reduced latency and modality-agnostic supervision, while acknowledging the lack of experimental results and outlining future work on multi-modal and bidirectional extensions.

Abstract

In this paper, we investigate the problem of "generation supervision" in large language models, and present a novel bicameral architecture to separate supervision signals from their core capability, helpfulness. Doppelgänger, a new module parallel to the underlying language model, supervises the generation of each token, and learns to concurrently predict the supervision score(s) of the sequences up to and including each token. In this work, we present the theoretical findings, and leave the report on experimental results to a forthcoming publication.

Doppelgänger's Watch: A Split Objective Approach to Large Language Models

TL;DR

This work addresses balancing multiple supervisory signals (e.g., factual correctness, sentiment, bias) with the core helpfulness of large language models. It introduces a bicameral Transformer where a parallel Doppelgänger supervises token-by-token generation while the language component remains frozen, preserving pretrained capabilities and enabling objective-agnostic supervision. The authors formalize the approach with an extended language function and monotonic composite rewards , proving a Split Objective Supremacy lemma that independently optimized objectives can achieve equal or higher composite rewards than a single-objective one. The paper argues for practical benefits, including reduced latency and modality-agnostic supervision, while acknowledging the lack of experimental results and outlining future work on multi-modal and bidirectional extensions.

Abstract

In this paper, we investigate the problem of "generation supervision" in large language models, and present a novel bicameral architecture to separate supervision signals from their core capability, helpfulness. Doppelgänger, a new module parallel to the underlying language model, supervises the generation of each token, and learns to concurrently predict the supervision score(s) of the sequences up to and including each token. In this work, we present the theoretical findings, and leave the report on experimental results to a forthcoming publication.
Paper Structure (10 sections, 1 theorem, 7 equations, 2 figures)

This paper contains 10 sections, 1 theorem, 7 equations, 2 figures.

Key Result

Lemma 4.1

For any given extended language function $\mathcal{L}_\theta: \mathcal{T^*} \rightarrow \mathcal{S}$ and a monotonic composite reward function $\mathcal{CR}$ composed of monotonic reward functions $\mathcal{R}_{\mathcal{S}_i}: \mathcal{S}_i \rightarrow \mathbb{R}, 0 < i \leq n$, there exists another

Figures (2)

  • Figure 1: In response to "How can exercising impact your life?", a language model can generate a response similar to the one demonstrated here. The proposed architecture allows the language model to generate supervisory signals for the generated response up to and including each token, as the token is generated. These supervisory signals can represent objectives like factual correctness, relevance, or sentiment, which in this figure are represented by green, blue, and red colors, respectively. For simplicity, the intensity of each color indicates the score on these respective axes.
  • Figure 2: The proposed bicameral architecture, which is an extension to the Transformer'vaswani2017attention architecture. This figure itself is also inspired by the illustrations provided in the same paper. Both components are decoder only transformers, consisting of $N$ attention modules, and receive the same input at their first layers, which are input embeddings enhanced by positional encoding. The output of the last attention module from the language component is used for token generation, and from the Doppelgänger component for supervision scores.

Theorems & Definitions (8)

  • Definition 4.1: Language Function
  • Definition 4.2: Extended Language Function
  • Example 4.1
  • Example 4.2
  • Definition 4.3: Reward Function
  • Definition 4.4: Composite Reward Function
  • Lemma 4.1: Split Objective Supremacy
  • Proof A.1