Table of Contents
Fetching ...

BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft

Hengquan Guo

TL;DR

An open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts and a progressive LoRA curriculum that initializes each phase from the previous adapter to improve stability and efficiency is proposed.

Abstract

We present \textbf{BLOCK}, an open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts. BLOCK decomposes the problem into (i) a \textbf{3D preview synthesis stage} driven by a large multimodal model (MLLM) with a carefully designed prompt-and-reference template, producing a consistent dual-panel (front/back) oblique-view Minecraft-style preview; and (ii) a \textbf{skin decoding stage} based on a fine-tuned FLUX.2 model that translates the preview into a skin atlas image. We further propose \textbf{EvolveLoRA}, a progressive LoRA curriculum (text-to-image $\rightarrow$ image-to-image $\rightarrow$ preview-to-skin) that initializes each phase from the previous adapter to improve stability and efficiency. BLOCK is released with all prompt templates and fine-tuned weights to support reproducible character-to-skin generation.

BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft

TL;DR

An open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts and a progressive LoRA curriculum that initializes each phase from the previous adapter to improve stability and efficiency is proposed.

Abstract

We present \textbf{BLOCK}, an open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts. BLOCK decomposes the problem into (i) a \textbf{3D preview synthesis stage} driven by a large multimodal model (MLLM) with a carefully designed prompt-and-reference template, producing a consistent dual-panel (front/back) oblique-view Minecraft-style preview; and (ii) a \textbf{skin decoding stage} based on a fine-tuned FLUX.2 model that translates the preview into a skin atlas image. We further propose \textbf{EvolveLoRA}, a progressive LoRA curriculum (text-to-image image-to-image preview-to-skin) that initializes each phase from the previous adapter to improve stability and efficiency. BLOCK is released with all prompt templates and fine-tuned weights to support reproducible character-to-skin generation.
Paper Structure (22 sections, 7 figures)

This paper contains 22 sections, 7 figures.

Figures (7)

  • Figure 1: Current Powerful MLLMs fail to generate Minecraft skin from character concept.
  • Figure 2: Overview of BLOCK.
  • Figure 3: Stage-1 Mode-I input anchors (B/C) and a prompt excerpt (Gemini Nano Banana Pro).
  • Figure 4: Example 1, A beginning scenario.
  • Figure 5: Example 2, A considerably more difficult scenario involving many fine-grained details.
  • ...and 2 more figures