MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
Liman Wang, Hanyang Zhong, Tianyuan Wang, Shan Luo, Jihong Zhu
TL;DR
MLLM-Fabric addresses the challenge of fabric selection by reframing it as property-specific pairwise ranking using a multimodal large language model. The framework fuses RGB vision, GelSight visuotactile data, and force signals, and trains with supervised preferences plus explanation-guided distillation to produce interpretable, abstention-aware decisions. A real-world dataset of 220 fabrics with co-registered RGB, GelSight, and pressure data supports reproducible benchmarking, and Fabric-Llama-90B demonstrates superior attribute ranking and selection reliability compared with baselines. The work advances robotic material understanding by linking perceptual cues to functional properties and decision-making, with implications for automated textile manufacturing and smart retail.
Abstract
Choosing appropriate fabrics is critical for meeting functional and quality demands in robotic textile manufacturing, apparel production, and smart retail. We propose MLLM-Fabric, a robotic framework leveraging multimodal large language models (MLLMs) for fabric sorting and selection. Built on a multimodal robotic platform, the system is trained through supervised fine-tuning and explanation-guided distillation to rank fabric properties. We also release a dataset of 220 diverse fabrics, each with RGB images and synchronized visuotactile and pressure data. Experiments show that our Fabric-Llama-90B consistently outperforms pretrained vision-language baselines in both attribute ranking and selection reliability. Code and dataset are publicly available at https://github.com/limanwang/MLLM-Fabric.
