EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
Suzhen Wang, Weijie Chen, Wei Zhang, Minda Zhao, Lincheng Li, Rongsheng Zhang, Zhipeng Hu, Xin Yu
TL;DR
EasyCraft tackles cross-engine avatar crafting by unifying image representations through a self-supervised ViT encoder and learning an engine-specific translator to output crafting parameters. It enables both photo- and text-based avatar creation by integrating a translator with a text-to-image path built on Stable Diffusion trained to mimic engine style. The key contributions include a universal feature extractor, an engine-agnostic translator trained solely on engine data, and an engine-aligned text-to-face image model that together achieve state-of-the-art results on two RPG engines. This framework improves generalizability across different avatar engines and input styles, enabling real-time, versatile avatar creation in games.
Abstract
Character customization, or 'face crafting,' is a vital feature in role-playing games (RPGs), enhancing player engagement by enabling the creation of personalized avatars. Existing automated methods often struggle with generalizability across diverse game engines due to their reliance on the intermediate constraints of specific image domain and typically support only one type of input, either text or image. To overcome these challenges, we introduce EasyCraft, an innovative end-to-end feedforward framework that automates character crafting by uniquely supporting both text and image inputs. Our approach employs a translator capable of converting facial images of any style into crafting parameters. We first establish a unified feature distribution in the translator's image encoder through self-supervised learning on a large-scale dataset, enabling photos of any style to be embedded into a unified feature representation. Subsequently, we map this unified feature distribution to crafting parameters specific to a game engine, a process that can be easily adapted to most game engines and thus enhances EasyCraft's generalizability. By integrating text-to-image techniques with our translator, EasyCraft also facilitates precise, text-based character crafting. EasyCraft's ability to integrate diverse inputs significantly enhances the versatility and accuracy of avatar creation. Extensive experiments on two RPG games demonstrate the effectiveness of our method, achieving state-of-the-art results and facilitating adaptability across various avatar engines.
