Table of Contents
Fetching ...

PQDAST: Depth-Aware Arbitrary Style Transfer for Games via Perceptual Quality-Guided Distillation

Eleftherios Ioannou, Steve Maddock

TL;DR

PQDAST tackles the challenge of real-time arbitrary style transfer in computer games by integrating a compressed, depth-aware style-transfer network into the rendering pipeline. It introduces a perceptual quality-guided distillation framework using FLIP to train a compact student model that closely matches a strong teacher (RAST/SANet) while drastically reducing memory and compute. Depth reconstruction and temporal losses further stabilize stylisation across frames, and the system is embedded in Unity HDRP via a Custom Pass to preserve post-processing effects. Empirical results show PQDAST achieves superior temporal coherence with competitive stylisation quality and practical in-game performance, demonstrating a viable path for flexible, real-time artistic stylisation in games.

Abstract

Artistic style transfer is concerned with the generation of imagery that combines the content of an image with the style of an artwork. In the realm of computer games, most work has focused on post-processing video frames. Some recent work has integrated style transfer into the game pipeline, but it is limited to single styles. Integrating an arbitrary style transfer method into the game pipeline is challenging due to the memory and speed requirements of games. We present PQDAST, the first solution to address this. We use a perceptual quality-guided knowledge distillation framework and train a compressed model using the FLIP evaluator, which substantially reduces both memory usage and processing time with limited impact on stylisation quality. For better preservation of depth and fine details, we utilise a synthetic dataset with depth and temporal considerations during training. The developed model is injected into the rendering pipeline to further enforce temporal stability and avoid diminishing post-process effects. Quantitative and qualitative experiments demonstrate that our approach achieves superior performance in temporal consistency, with comparable style transfer quality, to state-of-the-art image, video and in-game methods.

PQDAST: Depth-Aware Arbitrary Style Transfer for Games via Perceptual Quality-Guided Distillation

TL;DR

PQDAST tackles the challenge of real-time arbitrary style transfer in computer games by integrating a compressed, depth-aware style-transfer network into the rendering pipeline. It introduces a perceptual quality-guided distillation framework using FLIP to train a compact student model that closely matches a strong teacher (RAST/SANet) while drastically reducing memory and compute. Depth reconstruction and temporal losses further stabilize stylisation across frames, and the system is embedded in Unity HDRP via a Custom Pass to preserve post-processing effects. Empirical results show PQDAST achieves superior temporal coherence with competitive stylisation quality and practical in-game performance, demonstrating a viable path for flexible, real-time artistic stylisation in games.

Abstract

Artistic style transfer is concerned with the generation of imagery that combines the content of an image with the style of an artwork. In the realm of computer games, most work has focused on post-processing video frames. Some recent work has integrated style transfer into the game pipeline, but it is limited to single styles. Integrating an arbitrary style transfer method into the game pipeline is challenging due to the memory and speed requirements of games. We present PQDAST, the first solution to address this. We use a perceptual quality-guided knowledge distillation framework and train a compressed model using the FLIP evaluator, which substantially reduces both memory usage and processing time with limited impact on stylisation quality. For better preservation of depth and fine details, we utilise a synthetic dataset with depth and temporal considerations during training. The developed model is injected into the rendering pipeline to further enforce temporal stability and avoid diminishing post-process effects. Quantitative and qualitative experiments demonstrate that our approach achieves superior performance in temporal consistency, with comparable style transfer quality, to state-of-the-art image, video and in-game methods.

Paper Structure

This paper contains 22 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Our proposed perceptual quality-guided knowledge distillation framework utilises the FLIP evaluator. Depth and temporal loss functions are also defined. The trained model is injected into the 3D rendering pipeline.
  • Figure 2: Overview of PQDAST Architecture. PQDAST trains a compressed version of RAST's decoder and SANet modules using perceptual quality-guided distillation losses. It also uses a depth reconstruction loss and a temporal loss in addition to the content and style losses.
  • Figure 3: Decoder Architecture: RAST vs PQDAST.
  • Figure 4: Results comparing PQDAST to RAST. RAST is used as a post-processing effect. The input frame is shown on the left. The difference between the shown and previous frames is visualised using the FLIP evaluator. In-game PQDAST generates temporally consistent results. The difference between the shown and previous frames is visualised using the FLIP evaluator. The table below the images provides the numerical value of this difference (calculated using FLIP), for each method and for each row of the figure.
  • Figure 5: Qualitative results comparing PQDAST to state-of-the-art methods. A heatmap of the temporal error between the current and previous frame is included in the bottom row. Our proposed approach produces high-quality stylisations. The temporal error heatmap of PQDAST in-game is closest to the original frame's heatmap along with NSTFCG and GBGST that are used in-game. Additional results are provided in Figure \ref{['fig:results_comparison_2']}.
  • ...and 4 more figures