Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

Jianzhe Lin; Zeyu Pan; Yun Zhu; Ruiqi Song; Jining Yang

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

Jianzhe Lin, Zeyu Pan, Yun Zhu, Ruiqi Song, Jining Yang

TL;DR

The paper introduces SuperIntelliAgent, a self-contained framework that enables continual intelligence growth by coupling a trainable diffusion learner with a frozen verifier that reasons about prompts. It converts generation into self-supervised learning signals via an iterative verification–refinement loop and Direct Preference Optimization, reinforced by dual-scale memory and a replay buffer for adaptive curricula. Through asynchronous training and LoRA-based parameter-efficient updates, the approach achieves progressive semantic alignment and robust compositional reasoning across GenEval, DPG-Bench, and T2I-CompBench, with complementary gains to backbone scaling. The framework is designed to be integration-friendly for existing agent architectures and supports extensions to reasoning tasks, federated training, and production deployments with human-in-the-loop options. Collectively, this work presents a practical pathway toward autonomous, continual improvement of generative agents suitable for real-world, privacy-conscious deployment.

Abstract

We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continual intelligence growth through self-supervised interaction. Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation: the learner generates candidate outputs, the verifier evaluates them through step-by-step reasoning, and their interaction produces chosen/rejected pairs for Direct Preference Optimization (DPO). This converts each input into a pseudo-training signal for continual improvement. The framework integrates dual-scale memory: short-term in-context memory that preserves reasoning traces across refinement cycles, and long-term memory that consolidates acquired knowledge through lightweight on-the-fly fine-tuning. A replay buffer retains samples that show verifiable progress and replays them as auxiliary supervision, reinforcing recent learning while forming adaptive curricula. SuperIntelliAgent is infrastructure-agnostic and can be plugged into existing agentic frameworks while turning ordinary inference loops into a lifelong optimization process. We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence, as paired feedback and partial-history replay yield richer learning curricula and stronger preference alignment. With a small number of automatically generated DPO pairs, the learner improves across all benchmarks, indicating that this mechanism provides a promising direction for continual intelligence accumulation and real-world deployment.

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

TL;DR

Abstract

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)