Table of Contents
Fetching ...

The Ungrounded Alignment Problem

Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones

TL;DR

The paper addresses how to encode predefined, high-level concepts in a modality-agnostic, unsupervised learner without ground-truth labels, by formalizing The Ungrounded Alignment Problem and illustrating it with a fnord-trigger detection task. The core approach introduces a fixed, innate bigram prior Bi(y|x) and a batch-contrastive alignment loss $ abla\mathcal{L}(e,d)$ that ties the encoder's output distributions to the bigram predictions, enabling correct mapping from images to character classes without labels. Empirically, the method achieves near-perfect fnord detection on permuted EMNIST and strong performance on permuted CIFAR26, while also producing meaningful unsupervised character classifications (e.g., 82.14% EMNIST), and outperforms unigram-based baselines. The results suggest a promising path for encoding innate, relational concepts in modality-agnostic models, with potential extensions to more complex domains and downstream control tasks.

Abstract

Modern machine learning systems have demonstrated substantial abilities with methods that either embrace or ignore human-provided knowledge, but combining benefits of both styles remains a challenge. One particular challenge involves designing learning systems that exhibit built-in responses to specific abstract stimulus patterns, yet are still plastic enough to be agnostic about the modality and exact form of their inputs. In this paper, we investigate what we call The Ungrounded Alignment Problem, which asks How can we build in predefined knowledge in a system where we don't know how a given stimulus will be grounded? This paper examines a simplified version of the general problem, where an unsupervised learner is presented with a sequence of images for the characters in a text corpus, and this learner is later evaluated on its ability to recognize specific (possibly rare) sequential patterns. Importantly, the learner is given no labels during learning or evaluation, but must map images from an unknown font or permutation to its correct class label. That is, at no point is our learner given labeled images, where an image vector is explicitly associated with a class label. Despite ample work in unsupervised and self-supervised loss functions, all current methods require a labeled fine-tuning phase to map the learned representations to correct classes. Finding this mapping in the absence of labels may seem a fool's errand, but our main result resolves this seeming paradox. We show that leveraging only letter bigram frequencies is sufficient for an unsupervised learner both to reliably associate images to class labels and to reliably identify trigger words in the sequence of inputs. More generally, this method suggests an approach for encoding specific desired innate behaviour in modality-agnostic models.

The Ungrounded Alignment Problem

TL;DR

The paper addresses how to encode predefined, high-level concepts in a modality-agnostic, unsupervised learner without ground-truth labels, by formalizing The Ungrounded Alignment Problem and illustrating it with a fnord-trigger detection task. The core approach introduces a fixed, innate bigram prior Bi(y|x) and a batch-contrastive alignment loss that ties the encoder's output distributions to the bigram predictions, enabling correct mapping from images to character classes without labels. Empirically, the method achieves near-perfect fnord detection on permuted EMNIST and strong performance on permuted CIFAR26, while also producing meaningful unsupervised character classifications (e.g., 82.14% EMNIST), and outperforms unigram-based baselines. The results suggest a promising path for encoding innate, relational concepts in modality-agnostic models, with potential extensions to more complex domains and downstream control tasks.

Abstract

Modern machine learning systems have demonstrated substantial abilities with methods that either embrace or ignore human-provided knowledge, but combining benefits of both styles remains a challenge. One particular challenge involves designing learning systems that exhibit built-in responses to specific abstract stimulus patterns, yet are still plastic enough to be agnostic about the modality and exact form of their inputs. In this paper, we investigate what we call The Ungrounded Alignment Problem, which asks How can we build in predefined knowledge in a system where we don't know how a given stimulus will be grounded? This paper examines a simplified version of the general problem, where an unsupervised learner is presented with a sequence of images for the characters in a text corpus, and this learner is later evaluated on its ability to recognize specific (possibly rare) sequential patterns. Importantly, the learner is given no labels during learning or evaluation, but must map images from an unknown font or permutation to its correct class label. That is, at no point is our learner given labeled images, where an image vector is explicitly associated with a class label. Despite ample work in unsupervised and self-supervised loss functions, all current methods require a labeled fine-tuning phase to map the learned representations to correct classes. Finding this mapping in the absence of labels may seem a fool's errand, but our main result resolves this seeming paradox. We show that leveraging only letter bigram frequencies is sufficient for an unsupervised learner both to reliably associate images to class labels and to reliably identify trigger words in the sequence of inputs. More generally, this method suggests an approach for encoding specific desired innate behaviour in modality-agnostic models.
Paper Structure (19 sections, 5 equations, 8 figures, 2 tables)

This paper contains 19 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An analogy of our simplified version: A beaver must be able to connect abstract concepts, like "stackable objects" to sensors without labels or modality-specific wiring. In our formalized problem, the model's task is to recognize instances of particular character sequences while being modality agnostic (before training) and without the use of labels.
  • Figure 2: Above: An example sequence of images representing the string fnordscoreandsevenyearsago in the EMNIST "font". Note that upper and lower case character forms are used interchangeably. Below: The same string in the "CIFAR26 font". In this "font", a is represented by images of apples, f by beds, n by buses, etc., assigning the first 26 CIFAR100 classes to letters. In our experiment, we simply ordered the classes alphabetically, so b is "aquarium fish". (See Appendix \ref{['appendix:cifar']} for more details.)
  • Figure 3: The loss process. The encoder is shared for both input images, and is trained from scratch using the bigram probability table and batch contrastive loss. Note that no labels are used in this process. The bigram table is "innate" and fixed.
  • Figure 4: The encoder.
  • Figure 5: The training loss for linear interpolations between random initializations and the best model found. Note that the loss is not monotonic for most of the seeds.
  • ...and 3 more figures