BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
Baldomero R. Árbol, Dan Casas
TL;DR
This work tackles converting natural language body descriptions into precise 3D human shapes by mapping text to SMPL-X shape parameters, enabling text-driven avatar generation. It trains a fine-tuned LLM (Llama-3 8B) with LoRA and 4-bit quantization on a dataset of $18{,}000$ training samples and $2{,}000$ evaluation cases, using a composite loss $\mathcal{L} = \mathcal{L}_\text{LLM} + \mathcal{L}_\text{shape} + \mathcal{L}_\text{measurements}$ to predict the 10D shape vector $\boldsymbol{\beta} \in \mathbb{R}^{10}$. Quantitative results show higher accuracy across body measurements and BMI distributions compared to baselines, while qualitative prompts yield robust and diverse avatars. This approach enables fast, text-driven avatar creation for storytelling and virtual environments, expanding human-machine interaction by enabling shape control directly from natural language.
Abstract
Generative AI models provide a wide range of tools capable of performing complex tasks in a fraction of the time it would take a human. Among these, Large Language Models (LLMs) stand out for their ability to generate diverse texts, from literary narratives to specialized responses in different fields of knowledge. This paper explores the use of fine-tuned LLMs to identify physical descriptions of people, and subsequently create accurate representations of avatars using the SMPL-X model by inferring shape parameters. We demonstrate that LLMs can be trained to understand and manipulate the shape space of SMPL, allowing the control of 3D human shapes through natural language. This approach promises to improve human-machine interaction and opens new avenues for customization and simulation in virtual environments.
