Table of Contents
Fetching ...

Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

Thomas Manuel Rost

Abstract

Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.

Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

Abstract

Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.

Paper Structure

This paper contains 26 sections, 1 equation, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Example of the effective path when layers $i$ through $j$ are repeated.
  • Figure 2: Pipeline for global circuit selection. All 66 duplicated circuits are swept over the frozen DINOv3 backbone. A single best $(i,j)$ pair is selected on the validation pool across all classes, and final performance is reported on the held-out test set.
  • Figure 3: Pipeline for class-specific circuit selection. The same sweep is performed, but the best $(i,j)$ pair is selected independently for each class on the validation pool. Different species may therefore receive different optimal inference paths through the frozen transformer.
  • Figure 4: Global macro F1 across label budgets. Red: standard frozen baseline. Blue: globally optimized circuit (Exp2). Purple: class-specific circuit selection (Exp3). The green dashed line marks the fully supervised ConvNeXt benchmark at 88.9% fuad2026aqua20. Circuit duplication improves over the baseline at every budget, with class-specific selection providing the largest gains. At 100% labels, the gap to full supervision narrows to 1.4 percentage points.
  • Figure 5: Per-class F1 scores at the 100% label budget. Red: best standard frozen baseline classifier per class. Blue: per-class score under the globally optimized circuit (Exp2). Purple: class-specific circuit selection (Exp3). Green dashed lines: ConvNeXt fully supervised benchmark per class fuad2026aqua20. Classes are sorted by the difference between class-specific selection and the ConvNeXt reference. Octopus, seaUrchin, fishInGroups, and starfish exceed the fully supervised benchmark.
  • ...and 8 more figures