Table of Contents
Fetching ...

Jano: Adaptive Diffusion Generation with Early-stage Convergence Awareness

Yuyang Chen, Linqian Zeng, Yijin ZHou, Hengjie Li, Jidong Zhai

TL;DR

Jano introduces an early-stage complexity recognition algorithm that accurately identifies regional convergence requirements within initial denoising steps, coupled with an adaptive token scheduling runtime that optimizes computational resource allocation.

Abstract

Diffusion models have achieved remarkable success in generative AI, yet their computational efficiency remains a significant challenge, particularly for Diffusion Transformers (DiTs) requiring intensive full-attention computation. While existing acceleration approaches focus on content-agnostic uniform optimization strategies, we observe that different regions in generated content exhibit heterogeneous convergence patterns during the denoising process. We present Jano, a training-free framework that leverages this insight for efficient region-aware generation. Jano introduces an early-stage complexity recognition algorithm that accurately identifies regional convergence requirements within initial denoising steps, coupled with an adaptive token scheduling runtime that optimizes computational resource allocation. Through comprehensive evaluation on state-of-the-art models, Jano achieves substantial acceleration (average 2.0 times speedup, up to 2.4 times) while preserving generation quality. Our work challenges conventional uniform processing assumptions and provides a practical solution for accelerating large-scale content generation. The source code of our implementation is available at https://github.com/chen-yy20/Jano.

Jano: Adaptive Diffusion Generation with Early-stage Convergence Awareness

TL;DR

Jano introduces an early-stage complexity recognition algorithm that accurately identifies regional convergence requirements within initial denoising steps, coupled with an adaptive token scheduling runtime that optimizes computational resource allocation.

Abstract

Diffusion models have achieved remarkable success in generative AI, yet their computational efficiency remains a significant challenge, particularly for Diffusion Transformers (DiTs) requiring intensive full-attention computation. While existing acceleration approaches focus on content-agnostic uniform optimization strategies, we observe that different regions in generated content exhibit heterogeneous convergence patterns during the denoising process. We present Jano, a training-free framework that leverages this insight for efficient region-aware generation. Jano introduces an early-stage complexity recognition algorithm that accurately identifies regional convergence requirements within initial denoising steps, coupled with an adaptive token scheduling runtime that optimizes computational resource allocation. Through comprehensive evaluation on state-of-the-art models, Jano achieves substantial acceleration (average 2.0 times speedup, up to 2.4 times) while preserving generation quality. Our work challenges conventional uniform processing assumptions and provides a practical solution for accelerating large-scale content generation. The source code of our implementation is available at https://github.com/chen-yy20/Jano.
Paper Structure (36 sections, 20 equations, 17 figures, 4 tables)

This paper contains 36 sections, 20 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Jano foresees regional convergence through early-stage complexity recognition. Then it adaptively allocates computation resources, achieving $2.0\times$ speedup without perceptual loss.
  • Figure 2: Motivation Example with complexity and convergence pattern analysis ($r=0.70,\ \rho=0.74$).
  • Figure 3: Velocity differences between both intra-frame similar points (A, B) and inter-frame similar points (A, D) remain approximately constant across timesteps.
  • Figure 4: Statistical validation and visualization of complexity-driven convergence-level categorization. (a) Convergence score exhibits significant correlation with complexity across the dataset ($r = 0.61, \ \rho = 0.69,\ p < 0.001$), naturally forming three distinct levels. Predicted complexity scores (b) are mapped to three convergence levels (c) through optimized thresholds.
  • Figure 5: Jano adaptively computes tokens of different convergence levels through an interleaved pipeline
  • ...and 12 more figures