Doppelgänger's Watch: A Split Objective Approach to Large Language Models
Shervin Ghasemlou, Ashish Katiyar, Aparajita Saraf, Seungwhan Moon, Mangesh Pujari, Pinar Donmez, Babak Damavandi, Anuj Kumar
TL;DR
This work addresses balancing multiple supervisory signals (e.g., factual correctness, sentiment, bias) with the core helpfulness of large language models. It introduces a bicameral Transformer where a parallel Doppelgänger supervises token-by-token generation while the language component remains frozen, preserving pretrained capabilities and enabling objective-agnostic supervision. The authors formalize the approach with an extended language function $ ext{L}_{ heta_1, abla, heta_n}$ and monotonic composite rewards $ ext{CR}$, proving a Split Objective Supremacy lemma that independently optimized objectives can achieve equal or higher composite rewards than a single-objective one. The paper argues for practical benefits, including reduced latency and modality-agnostic supervision, while acknowledging the lack of experimental results and outlining future work on multi-modal and bidirectional extensions.
Abstract
In this paper, we investigate the problem of "generation supervision" in large language models, and present a novel bicameral architecture to separate supervision signals from their core capability, helpfulness. Doppelgänger, a new module parallel to the underlying language model, supervises the generation of each token, and learns to concurrently predict the supervision score(s) of the sequences up to and including each token. In this work, we present the theoretical findings, and leave the report on experimental results to a forthcoming publication.
