Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

MinKyu Lee; Sangeek Hyun; Woojin Jun; Hyunjun Kim; Jiwoo Chung; Jae-Pil Heo

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

MinKyu Lee, Sangeek Hyun, Woojin Jun, Hyunjun Kim, Jiwoo Chung, Jae-Pil Heo

TL;DR

This work proposes Image Restoration Transformer Tailored Layer Normalization i-LN, a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input, and provides theoretical insights and empirical evidence that this simple design effectively leads to both improved training dynamics and thereby improved performance.

Abstract

This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise entropy. We analyze this in the perspective of networks attempting to bypass LN's constraints that conflict with IR tasks. Accordingly, we address two misalignments between LN and IR: 1) per-token normalization disrupts spatial correlations, and 2) input-independent scaling discards input-specific statistics. To address this, we propose Image Restoration Transformer Tailored Layer Normalization i-LN, a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input. We provide theoretical insights and empirical evidence that this simple design effectively leads to both improved training dynamics and thereby improved performance, validated by extensive experiments.

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

TL;DR

Abstract

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (29)

Theorems & Definitions (4)