Table of Contents
Fetching ...

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

Jianzhe Lin, Zeyu Pan, Yun Zhu, Ruiqi Song, Jining Yang

TL;DR

The paper introduces SuperIntelliAgent, a self-contained framework that enables continual intelligence growth by coupling a trainable diffusion learner with a frozen verifier that reasons about prompts. It converts generation into self-supervised learning signals via an iterative verification–refinement loop and Direct Preference Optimization, reinforced by dual-scale memory and a replay buffer for adaptive curricula. Through asynchronous training and LoRA-based parameter-efficient updates, the approach achieves progressive semantic alignment and robust compositional reasoning across GenEval, DPG-Bench, and T2I-CompBench, with complementary gains to backbone scaling. The framework is designed to be integration-friendly for existing agent architectures and supports extensions to reasoning tasks, federated training, and production deployments with human-in-the-loop options. Collectively, this work presents a practical pathway toward autonomous, continual improvement of generative agents suitable for real-world, privacy-conscious deployment.

Abstract

We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continual intelligence growth through self-supervised interaction. Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation: the learner generates candidate outputs, the verifier evaluates them through step-by-step reasoning, and their interaction produces chosen/rejected pairs for Direct Preference Optimization (DPO). This converts each input into a pseudo-training signal for continual improvement. The framework integrates dual-scale memory: short-term in-context memory that preserves reasoning traces across refinement cycles, and long-term memory that consolidates acquired knowledge through lightweight on-the-fly fine-tuning. A replay buffer retains samples that show verifiable progress and replays them as auxiliary supervision, reinforcing recent learning while forming adaptive curricula. SuperIntelliAgent is infrastructure-agnostic and can be plugged into existing agentic frameworks while turning ordinary inference loops into a lifelong optimization process. We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence, as paired feedback and partial-history replay yield richer learning curricula and stronger preference alignment. With a small number of automatically generated DPO pairs, the learner improves across all benchmarks, indicating that this mechanism provides a promising direction for continual intelligence accumulation and real-world deployment.

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

TL;DR

The paper introduces SuperIntelliAgent, a self-contained framework that enables continual intelligence growth by coupling a trainable diffusion learner with a frozen verifier that reasons about prompts. It converts generation into self-supervised learning signals via an iterative verification–refinement loop and Direct Preference Optimization, reinforced by dual-scale memory and a replay buffer for adaptive curricula. Through asynchronous training and LoRA-based parameter-efficient updates, the approach achieves progressive semantic alignment and robust compositional reasoning across GenEval, DPG-Bench, and T2I-CompBench, with complementary gains to backbone scaling. The framework is designed to be integration-friendly for existing agent architectures and supports extensions to reasoning tasks, federated training, and production deployments with human-in-the-loop options. Collectively, this work presents a practical pathway toward autonomous, continual improvement of generative agents suitable for real-world, privacy-conscious deployment.

Abstract

We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continual intelligence growth through self-supervised interaction. Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation: the learner generates candidate outputs, the verifier evaluates them through step-by-step reasoning, and their interaction produces chosen/rejected pairs for Direct Preference Optimization (DPO). This converts each input into a pseudo-training signal for continual improvement. The framework integrates dual-scale memory: short-term in-context memory that preserves reasoning traces across refinement cycles, and long-term memory that consolidates acquired knowledge through lightweight on-the-fly fine-tuning. A replay buffer retains samples that show verifiable progress and replays them as auxiliary supervision, reinforcing recent learning while forming adaptive curricula. SuperIntelliAgent is infrastructure-agnostic and can be plugged into existing agentic frameworks while turning ordinary inference loops into a lifelong optimization process. We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence, as paired feedback and partial-history replay yield richer learning curricula and stronger preference alignment. With a small number of automatically generated DPO pairs, the learner improves across all benchmarks, indicating that this mechanism provides a promising direction for continual intelligence accumulation and real-world deployment.

Paper Structure

This paper contains 58 sections, 20 equations, 4 figures, 5 tables, 3 algorithms.

Figures (4)

  • Figure 1: Qualitative comparisons between baseline Janus outputs and images produced after continual training with SuperintelliAgent across five GenEval prompts.
  • Figure 2: Qualitative comparisons between baseline Janus outputs and images produced after continual training with SuperintelliAgent across five DPG prompts. Column 1: In the foreground of the image, a variety of colorful fruits are scattered across a wooden table, with their fine details and textures in sharp focus. The background features a blurred arrangement of kitchenware and a pastel-colored wall, providing a soft contrast to the vivid sharpness of the fruits on the table. The diffused light gently illuminates the scene, highlighting the smooth skins of the fruits and casting subtle shadows upon the wooden surface; Column 2: a collection of individuals clad in bright ski gear against the contrasting backdrop of a vast beige sand dune. Each person is equipped with skis and poles, ready to ascend the gentle slope of the dune under a clear blue sky. Their colorful attire stands out vividly against the monochrome landscape of sand. Column 3: An extraordinary rendition of Melbourne's Southern Cross Station presented from a bird's-eye view, encapsulated by the signature aesthetics akin to the works of Makoto Shinkai. The image boasts a resolution of 8K, delivering an ultra-detailed and sharply defined portrayal that captures even the subtlest of features. The station and its surroundings are bathed in epic lighting that casts dramatic shadows and projects vivid light refractions across the scene, offering a sense of hyperrealism that's enhanced by the ultra uplight effect. Each element within the composition is rendered with high fidelity, giving life to a photorealistic scene that is both captivating and intricately depicted. Column 4: A bold, white sign with the words 'KEEP OFF THE GRASS' stands prominently next to a lush, green lawn. The sign, with its stark black lettering, is mounted on a metal pole and positioned at the edge of the neatly trimmed grass. Surrounding the lawn are small flowering plants, adding a touch of color to the scene. Column 5: A dynamic scene unfolds at the historic Colosseum, where a fleet of sleek, multicolored racing cars roar past an excited crowd. The vehicles, adorned with vibrant decals and sponsor logos, navigate a temporary circuit that has been meticulously laid out within the ancient arena's interior. Spectators are perched on stone seats that have withstood the test of time, their attention fixed on the blur of machines vying for the lead under the bright afternoon sun.
  • Figure 3: Qualitative comparisons between baseline Janus outputs and images produced after continual training with SuperintelliAgent across three T2I prompts. Prompt 1: a red backpack and a blue book; Prompt 2: a green banana and a blue vase; Prompt 3: an oblong sweet potato and a teardrop peach.
  • Figure 4: Overview of the SuperIntelliAgent pipeline. The learner generates candidate outputs, the verifier performs semantic auditing, and DPO-based continual adaptation updates the learner asynchronously.