Table of Contents
Fetching ...

Inversion-Free Style Transfer with Dual Rectified Flows

Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong, Xucheng Yin

TL;DR

This work tackles the inefficiency and distortion introduced by inversion-based training-free style transfer methods by proposing an inversion-free framework built on dual Rectified Flows that operate entirely in forward ODEs. Content and style trajectories are predicted in parallel and fused through a midpoint interpolation, with a carefully designed velocity field that accounts for evolving stylized content and style distributions. An attention injection module further guides style integration, yielding improved visual fidelity and content preservation across diverse styles. Extensive experiments demonstrate that the approach delivers high-quality stylization with better fidelity and generalization, while offering faster inference than diffusion-based baselines. The proposed method provides a practical, scalable pipeline for inversion-free style transfer suitable for real-world creative tools.

Abstract

Style transfer, a pivotal task in image processing, synthesizes visually compelling images by seamlessly blending realistic content with artistic styles, enabling applications in photo editing and creative design. While mainstream training-free diffusion-based methods have greatly advanced style transfer in recent years, their reliance on computationally inversion processes compromises efficiency and introduces visual distortions when inversion is inaccurate. To address these limitations, we propose a novel \textit{inversion-free} style transfer framework based on dual rectified flows, which tackles the challenge of finding an unknown stylized distribution from two distinct inputs (content and style images), \textit{only with forward pass}. Our approach predicts content and style trajectories in parallel, then fuses them through a dynamic midpoint interpolation that integrates velocities from both paths while adapting to the evolving stylized image. By jointly modeling the content, style, and stylized distributions, our velocity field design achieves robust fusion and avoids the shortcomings of naive overlays. Attention injection further guides style integration, enhancing visual fidelity, content preservation, and computational efficiency. Extensive experiments demonstrate generalization across diverse styles and content, providing an effective and efficient pipeline for style transfer.

Inversion-Free Style Transfer with Dual Rectified Flows

TL;DR

This work tackles the inefficiency and distortion introduced by inversion-based training-free style transfer methods by proposing an inversion-free framework built on dual Rectified Flows that operate entirely in forward ODEs. Content and style trajectories are predicted in parallel and fused through a midpoint interpolation, with a carefully designed velocity field that accounts for evolving stylized content and style distributions. An attention injection module further guides style integration, yielding improved visual fidelity and content preservation across diverse styles. Extensive experiments demonstrate that the approach delivers high-quality stylization with better fidelity and generalization, while offering faster inference than diffusion-based baselines. The proposed method provides a practical, scalable pipeline for inversion-free style transfer suitable for real-world creative tools.

Abstract

Style transfer, a pivotal task in image processing, synthesizes visually compelling images by seamlessly blending realistic content with artistic styles, enabling applications in photo editing and creative design. While mainstream training-free diffusion-based methods have greatly advanced style transfer in recent years, their reliance on computationally inversion processes compromises efficiency and introduces visual distortions when inversion is inaccurate. To address these limitations, we propose a novel \textit{inversion-free} style transfer framework based on dual rectified flows, which tackles the challenge of finding an unknown stylized distribution from two distinct inputs (content and style images), \textit{only with forward pass}. Our approach predicts content and style trajectories in parallel, then fuses them through a dynamic midpoint interpolation that integrates velocities from both paths while adapting to the evolving stylized image. By jointly modeling the content, style, and stylized distributions, our velocity field design achieves robust fusion and avoids the shortcomings of naive overlays. Attention injection further guides style integration, enhancing visual fidelity, content preservation, and computational efficiency. Extensive experiments demonstrate generalization across diverse styles and content, providing an effective and efficient pipeline for style transfer.

Paper Structure

This paper contains 25 sections, 28 equations, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: Image style transfer results from our proposed method. First row: Stylized outputs using various style and content references. Last two rows: Comparisons with state-of-the-art methods, including diffusion-based style transfer (inversion-based training-free models like StyleSSP StyleSSP and Zstar Deng_2024_CVPR; fine-tuning-based InST zhang:2023:inversion) and traditional StyTr$^2$Deng:2022:CVPR. Our approach preserves core structural details while delivering vibrant, distinctive artistic flair.
  • Figure 2: From image generation to style-transfer: (a) Image generation via the ReFlow model begins by sampling random noise $x_0$ from a Gaussian distribution. A velocity field $v_\theta(x, t, \psi)$, which depends on a text prompt $\psi$, is then employed to generate the image $x_1$. (b) To obtain the proper noise $x_0$ that can better reconstruct the content image $x_1$, we first solve the ODE conditioned on the source prompt $\psi$ to find the appropriate noise $x_0$. We then follow the image generation path to obtain the reconstructed image $x_1'$. (c) Assuming that the oracle content noise $x_0^c$ and style noise $x_0^s$ are known via inversion, we can interpret the pseudo noise $x_0^{stylized}$ using $x_0^c$ and $x_0^s$. It is then straightforward to use the velocities predicted by $x_t^{stylized}$ to form a new direction $v_\theta(x_t^{stylized}, t, \psi)$ towards stylized image. (d) To avoid the inversion process, we directly add random noise to $x_1^c$ and $x_1^s$. The noisy image $tx_1 + (1-t)x_0$ shows the same starting point $x_0^c = x_0^s$ for the forward ODEs. Then, by utilizing vector addition, we use the directions of content and style image denoising to form the new velocity $v(x_t^{{stylized}}, t, \psi)$ for style transfer.
  • Figure 3: Illustration of our style transfer trajectories in rectified flow space. a) The content trajectory (blue dotted line) starts from the shared Gaussian noise $x_0^c \sim \mathcal{N}(0,1)$ at $t=0$ and moves toward the content image $x^c_1$ at $t=1$. The style trajectory (purple dotted line) begins at the same noise $x_0^s = x_0^c$, converging to the style image $x^s_1$ at $t=1$. The shifted content trajectory (dark red arrow) adjusts the content path by $\tau (x^{stylized}_t - x^c_1)$, gradually aligning with the evolving stylized image $x^{stylized}_t$. The velocity of $x^{stylized}_t$ (red arrow) is formulated as the vector subtraction of the style image $x_t^s$'s velocity and the shifted content image $x'^c_t$'s velocity, targeting the style image distribution. These straight-line paths couple content and style via shared noise, guiding the transformation from the content image to the final stylized image at $t=1$. (b) Similar to the content-branch generation, it is straightforward to derive the mirrored style-branch inversion-free style transfer, where we use the style image as the initial point moving toward the content image. (c) We aim to balance content and style by deriving from both the content and style branches; hence, the midpoint was introduced as a natural combination of $x'^c_t$ and $x'^s_t$, predicting the fixed velocity $v_\theta(f(x'^s_t,x'^c_t), t, \psi)$, which is then refined by the velocities of the shifted content and style images ($v_\theta(x'^c_t, t, \psi)$ and $v_\theta(x'^s_t, t, \psi)$) to form the final velocity (red arrow).
  • Figure 4: Qualitative comparison of various style transfer methods. The first column displays the content images, the second column shows the style images, and the subsequent columns present the stylized results produced by different methods.
  • Figure 5: User study results on content preservation, style fidelity, and overall quality. For each criterion, the upper segment of the bar represents the preference rate for our method, while the lower segment corresponds to the collective preference for comparison methods.
  • ...and 4 more figures