Table of Contents
Fetching ...

"What's my model inside of?": Exploring the role of environments for grounded natural language understanding

Ronen Tamari

TL;DR

This thesis argues that grounding language understanding requires explicit consideration of the environments in which models operate, not just model architecture. It proposes an ecological NLU framework, develops text-based game environments for richer supervision, and introduces Breakpoint Transformers to model intermediate beliefs; it also presents the Dyna-bAbI benchmark and a stigmergy-inspired annotation paradigm to advance collective sensemaking. The work demonstrates that environment-driven data collection and evaluation can substantially improve procedural and commonsense reasoning, and discusses societal implications through AI-augmented epistemic environments. Together, the contributions shift focus from models alone to the wider ecological systems that shape NLU progress.

Abstract

In contrast to classical cognitive science which studied brains in isolation, ecological approaches focused on the role of the body and environment in shaping cognition. Similarly, in this thesis we adopt an ecological approach to grounded natural language understanding (NLU) research. Grounded language understanding studies language understanding systems situated in the context of events, actions and precepts in naturalistic/simulated virtual environments. Where classic research tends to focus on designing new models and optimization methods while treating environments as given, we explore the potential of environment design for improving data collection and model development. We developed novel training and annotation approaches for procedural text understanding based on text-based game environments. We also drew upon embodied cognitive linguistics literature to propose a roadmap for grounded NLP research, and to inform the development of a new benchmark for measuring the progress of large language models on challenging commonsense reasoning tasks. We leveraged the richer supervision provided by text-based game environments to develop Breakpoint Transformers, a novel approach to modeling intermediate semantic information in long narrative or procedural texts. Finally, we integrated theories on the role of environments in collective human intelligence to propose a design for AI-augmented "social thinking environments" for knowledge workers like scientists.

"What's my model inside of?": Exploring the role of environments for grounded natural language understanding

TL;DR

This thesis argues that grounding language understanding requires explicit consideration of the environments in which models operate, not just model architecture. It proposes an ecological NLU framework, develops text-based game environments for richer supervision, and introduces Breakpoint Transformers to model intermediate beliefs; it also presents the Dyna-bAbI benchmark and a stigmergy-inspired annotation paradigm to advance collective sensemaking. The work demonstrates that environment-driven data collection and evaluation can substantially improve procedural and commonsense reasoning, and discusses societal implications through AI-augmented epistemic environments. Together, the contributions shift focus from models alone to the wider ecological systems that shape NLU progress.

Abstract

In contrast to classical cognitive science which studied brains in isolation, ecological approaches focused on the role of the body and environment in shaping cognition. Similarly, in this thesis we adopt an ecological approach to grounded natural language understanding (NLU) research. Grounded language understanding studies language understanding systems situated in the context of events, actions and precepts in naturalistic/simulated virtual environments. Where classic research tends to focus on designing new models and optimization methods while treating environments as given, we explore the potential of environment design for improving data collection and model development. We developed novel training and annotation approaches for procedural text understanding based on text-based game environments. We also drew upon embodied cognitive linguistics literature to propose a roadmap for grounded NLP research, and to inform the development of a new benchmark for measuring the progress of large language models on challenging commonsense reasoning tasks. We leveraged the richer supervision provided by text-based game environments to develop Breakpoint Transformers, a novel approach to modeling intermediate semantic information in long narrative or procedural texts. Finally, we integrated theories on the role of environments in collective human intelligence to propose a design for AI-augmented "social thinking environments" for knowledge workers like scientists.
Paper Structure (10 sections)