Table of Contents
Fetching ...

Developmental Scaffolding with Large Language Models

Batuhan Celik, Alper Ahmetoglu, Emre Ugur, Erhan Oztop

TL;DR

This work investigates using a Large Language Model (GPT3.5) as a developmental scaffolding agent to guide a simulated tabletop robot learning action effects without fine-tuning. By framing action selection as choosing an 'interesting' outcome from algorithmically generated state descriptions and candidate actions, the study demonstrates faster discovery of tall towers compared to random exploration, particularly in moderate-complexity environments. However, the LLM struggles with objects exhibiting different affordances, such as spheres, revealing gaps in grounded inference and affordance understanding. The results suggest LLMs can provide valuable, low-cost scaffolding to improve robot learning, but more sophisticated grounding or model capabilities are needed for robust real-world applicability.

Abstract

Exploratoration and self-observation are key mechanisms of infant sensorimotor development. These processes are further guided by parental scaffolding accelerating skill and knowledge acquisition. In developmental robotics, this approach has been adopted often by having a human acting as the source of scaffolding. In this study, we investigate whether Large Language Models (LLMs) can act as a scaffolding agent for a robotic system that aims to learn to predict the effects of its actions. To this end, an object manipulation setup is considered where one object can be picked and placed on top of or in the vicinity of another object. The adopted LLM is asked to guide the action selection process through algorithmically generated state descriptions and action selection alternatives in natural language. The simulation experiments that include cubes in this setup show that LLM-guided (GPT3.5-guided) learning yields significantly faster discovery of novel structures compared to random exploration. However, we observed that GPT3.5 fails to effectively guide the robot in generating structures with different affordances such as cubes and spheres. Overall, we conclude that even without fine-tuning, LLMs may serve as a moderate scaffolding agent for improving robot learning, however, they still lack affordance understanding which limits the applicability of the current LLMs in robotic scaffolding tasks.

Developmental Scaffolding with Large Language Models

TL;DR

This work investigates using a Large Language Model (GPT3.5) as a developmental scaffolding agent to guide a simulated tabletop robot learning action effects without fine-tuning. By framing action selection as choosing an 'interesting' outcome from algorithmically generated state descriptions and candidate actions, the study demonstrates faster discovery of tall towers compared to random exploration, particularly in moderate-complexity environments. However, the LLM struggles with objects exhibiting different affordances, such as spheres, revealing gaps in grounded inference and affordance understanding. The results suggest LLMs can provide valuable, low-cost scaffolding to improve robot learning, but more sophisticated grounding or model capabilities are needed for robust real-world applicability.

Abstract

Exploratoration and self-observation are key mechanisms of infant sensorimotor development. These processes are further guided by parental scaffolding accelerating skill and knowledge acquisition. In developmental robotics, this approach has been adopted often by having a human acting as the source of scaffolding. In this study, we investigate whether Large Language Models (LLMs) can act as a scaffolding agent for a robotic system that aims to learn to predict the effects of its actions. To this end, an object manipulation setup is considered where one object can be picked and placed on top of or in the vicinity of another object. The adopted LLM is asked to guide the action selection process through algorithmically generated state descriptions and action selection alternatives in natural language. The simulation experiments that include cubes in this setup show that LLM-guided (GPT3.5-guided) learning yields significantly faster discovery of novel structures compared to random exploration. However, we observed that GPT3.5 fails to effectively guide the robot in generating structures with different affordances such as cubes and spheres. Overall, we conclude that even without fine-tuning, LLMs may serve as a moderate scaffolding agent for improving robot learning, however, they still lack affordance understanding which limits the applicability of the current LLMs in robotic scaffolding tasks.
Paper Structure (14 sections, 6 figures)

This paper contains 14 sections, 6 figures.

Figures (6)

  • Figure 1: Action execution steps are shown in the subfigures. Grids around objects indicate possible grasp and placement locations. In our experiments, only the center location is used for grasping. As for placement locations, the next, front, and center locations are used. Prior to the execution of this action, the sphere is placed in front of the purple cube, and the dark green cube is placed next to the purple cube.
  • Figure 2: Comparison of tower heights between random exploration and scaffolded exploration in different environment settings with incremental difficulty. The first setting contains 4 cubes and 2 positions, the second one introduces the fifth cube, and the last one introduces the third proximity location.
  • Figure 3: Effects of different adjectives on the tower heights in a 5 cubes 2 positions setting.
  • Figure 4: Height distributions of random and scaffolded sessions given 4 cubes and a sphere with 3 positions,
  • Figure 5: Comparison of average heights in LLM scaffolded sessions from experiments three and five with different sets of objects.
  • ...and 1 more figures