Game-invariant Features Through Contrastive and Domain-adversarial Learning
Dylan Kline
TL;DR
The paper tackles the problem of foundational game image encoders overfitting to visual styles by learning game-invariant representations. It proposes a hybrid objective that combines contrastive learning with domain-adversarial training via a Gradient Reversal Layer, aiming to preserve content while removing game identity cues. On the Bingsu Gameplay Images dataset, the method reduces game-based clustering in the embedding space after only a few epochs, with the domain classifier’s accuracy dropping toward chance, indicating successful invariance. This approach promises improved cross-game transfer and paves the way for universal game vision models that require little to no retraining for unseen titles, with potential applications such as glitch detection across diverse games. The authors validate their two-stage training pipeline, showing initial alignment of features to game identity and subsequent invariance while maintaining discriminative content structure through a SimCLR-style objective.
Abstract
Foundational game-image encoders often overfit to game-specific visual styles, undermining performance on downstream tasks when applied to new games. We present a method that combines contrastive learning and domain-adversarial training to learn game-invariant visual features. By simultaneously encouraging similar content to cluster and discouraging game-specific cues via an adversarial domain classifier, our approach produces embeddings that generalize across diverse games. Experiments on the Bingsu game-image dataset (10,000 screenshots from 10 games) demonstrate that after only a few training epochs, our model's features no longer cluster by game, indicating successful invariance and potential for improved cross-game transfer (e.g., glitch detection) with minimal fine-tuning. This capability paves the way for more generalizable game vision models that require little to no retraining on new games.
