TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
Victor Shea-Jay Huang, Le Zhuo, Yi Xin, Zhaokai Wang, Fu-Yun Wang, Yuchi Wang, Renrui Zhang, Peng Gao, Hongsheng Li
TL;DR
TIDE introduces Temporal-Aware Sparse Autoencoders to extract interpretable, sparse activations from Diffusion Transformers across diffusion timesteps, revealing that DiTs organize hierarchical 3D, semantic, and class-level features during large-scale pretraining. By training SAEs on DiT activations and adding timestep-dependent modulation, TIDE achieves improved reconstruction and interpretability with minimal sacrifice to generation quality. The approach demonstrates robustness across backbones and enables practical applications such as safe image editing and style transfer, supported by ablations and safety evaluations. Overall, TIDE provides a foundation for trustworthy, controllable diffusion-based generation by making internal representations transparent and manipulable.
Abstract
Diffusion Transformers (DiTs) are a powerful yet underexplored class of generative models compared to U-Net-based diffusion architectures. We propose TIDE-Temporal-aware sparse autoencoders for Interpretable Diffusion transformErs-a framework designed to extract sparse, interpretable activation features across timesteps in DiTs. TIDE effectively captures temporally-varying representations and reveals that DiTs naturally learn hierarchical semantics (e.g., 3D structure, object class, and fine-grained concepts) during large-scale pretraining. Experiments show that TIDE enhances interpretability and controllability while maintaining reasonable generation quality, enabling applications such as safe image editing and style transfer.
