Developmental Scaffolding with Large Language Models
Batuhan Celik, Alper Ahmetoglu, Emre Ugur, Erhan Oztop
TL;DR
This work investigates using a Large Language Model (GPT3.5) as a developmental scaffolding agent to guide a simulated tabletop robot learning action effects without fine-tuning. By framing action selection as choosing an 'interesting' outcome from algorithmically generated state descriptions and candidate actions, the study demonstrates faster discovery of tall towers compared to random exploration, particularly in moderate-complexity environments. However, the LLM struggles with objects exhibiting different affordances, such as spheres, revealing gaps in grounded inference and affordance understanding. The results suggest LLMs can provide valuable, low-cost scaffolding to improve robot learning, but more sophisticated grounding or model capabilities are needed for robust real-world applicability.
Abstract
Exploratoration and self-observation are key mechanisms of infant sensorimotor development. These processes are further guided by parental scaffolding accelerating skill and knowledge acquisition. In developmental robotics, this approach has been adopted often by having a human acting as the source of scaffolding. In this study, we investigate whether Large Language Models (LLMs) can act as a scaffolding agent for a robotic system that aims to learn to predict the effects of its actions. To this end, an object manipulation setup is considered where one object can be picked and placed on top of or in the vicinity of another object. The adopted LLM is asked to guide the action selection process through algorithmically generated state descriptions and action selection alternatives in natural language. The simulation experiments that include cubes in this setup show that LLM-guided (GPT3.5-guided) learning yields significantly faster discovery of novel structures compared to random exploration. However, we observed that GPT3.5 fails to effectively guide the robot in generating structures with different affordances such as cubes and spheres. Overall, we conclude that even without fine-tuning, LLMs may serve as a moderate scaffolding agent for improving robot learning, however, they still lack affordance understanding which limits the applicability of the current LLMs in robotic scaffolding tasks.
