Navigating High Dimensional Concept Space with Metalearning
Max Gupta
TL;DR
This work probes the limits of gradient-based meta-learning for few-shot Boolean concept learning using a PCFG-controlled space that separately varies featural dimensionality $F$ and compositional depth $D$. By comparing Meta-SGD with SGD and analyzing inner-loop adaptation, loss-landscape roughness, and Hessian curvature, it shows that meta-learning is robust to compositional depth but struggles with high feature dimensionality, though longer adaptation and curvature-aware updates can mitigate some difficulties. The study reveals that Meta-SGD can dramatically shorten optimization paths and improve generalization in many regimes, especially when landscapes are rugged, but high-dimensional feature spaces still pose a challenge. These findings offer a nuanced view of when meta-learning helps in concept learning and emphasize the role of second-order dynamics and extended gradient adaptation in navigating complex loss landscapes.
Abstract
Rapidly learning abstract concepts from limited examples is a hallmark of human intelligence. This work investigates whether gradient-based meta-learning can equip neural networks with inductive biases for efficient few-shot acquisition of discrete concepts. I compare meta-learning methods against a supervised learning baseline on Boolean concepts (logical statements) generated by a probabilistic context-free grammar (PCFG). By systematically varying concept dimensionality (number of features) and recursive compositionality (depth of grammar recursion), I delineate between complexity regimes in which meta-learning robustly improves few-shot concept learning and regimes in which it does not. Meta-learners are much better able to handle compositional complexity than featural complexity. I highlight some reasons for this with a representational analysis of the weights of meta-learners and a loss landscape analysis demonstrating how featural complexity increases the roughness of loss trajectories, allowing curvature-aware optimization to be more effective than first-order methods. I find improvements in out-of-distribution generalization on complex concepts by increasing the number of adaptation steps in meta-SGD, where adaptation acts as a way of encouraging exploration of rougher loss basins. Overall, this work highlights the intricacies of learning compositional versus featural complexity in high dimensional concept spaces and provides a road to understanding the role of 2nd order methods and extended gradient adaptation in few-shot concept learning.
