Table of Contents
Fetching ...

Counting Fish with Temporal Representations of Sonar Video

Kai Van Brunt, Justin Kay, Timm Haucke, Pietro Perona, Grant Van Horn, Sara Beery

TL;DR

This work addresses the challenge of counting migrating salmon (escapement) from imaging sonar under limited compute at deployment sites. It proposes a lightweight, temporally oriented approach that converts lengthy sonar video into echograms and uses a fine-tuned ResNet-18 to predict upstream and downstream counts over 200-frame windows, aided by weakly-supervised training and domain-specific data augmentations. On Kenai River data, the method achieves $nMAE$ of $23\%$ on KL-val and $30.7\%$ on KR, demonstrating feasibility while remaining behind frame-by-frame detector pipelines in raw accuracy. The study highlights practical deployment advantages, analyzes the impact of echogram generation and augmentations, and outlines future work to address class imbalance and expand validation data for broader applicability.

Abstract

Accurate estimates of salmon escapement - the number of fish migrating upstream to spawn - are key data for conservation and fishery management. Existing methods for salmon counting using high-resolution imaging sonar hardware are non-invasive and compatible with computer vision processing. Prior work in this area has utilized object detection and tracking based methods for automated salmon counting. However, these techniques remain inaccessible to many sonar deployment sites due to limited compute and connectivity in the field. We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms - temporal representations that compress several hundred frames of imaging sonar video into a single image. We predict upstream and downstream counts within 200-frame time windows directly from echograms using a ResNet-18 model, and propose a set of domain-specific image augmentations and a weakly-supervised training protocol to further improve results. We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.

Counting Fish with Temporal Representations of Sonar Video

TL;DR

This work addresses the challenge of counting migrating salmon (escapement) from imaging sonar under limited compute at deployment sites. It proposes a lightweight, temporally oriented approach that converts lengthy sonar video into echograms and uses a fine-tuned ResNet-18 to predict upstream and downstream counts over 200-frame windows, aided by weakly-supervised training and domain-specific data augmentations. On Kenai River data, the method achieves of on KL-val and on KR, demonstrating feasibility while remaining behind frame-by-frame detector pipelines in raw accuracy. The study highlights practical deployment advantages, analyzes the impact of echogram generation and augmentations, and outlines future work to address class imbalance and expand validation data for broader applicability.

Abstract

Accurate estimates of salmon escapement - the number of fish migrating upstream to spawn - are key data for conservation and fishery management. Existing methods for salmon counting using high-resolution imaging sonar hardware are non-invasive and compatible with computer vision processing. Prior work in this area has utilized object detection and tracking based methods for automated salmon counting. However, these techniques remain inaccessible to many sonar deployment sites due to limited compute and connectivity in the field. We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms - temporal representations that compress several hundred frames of imaging sonar video into a single image. We predict upstream and downstream counts within 200-frame time windows directly from echograms using a ResNet-18 model, and propose a set of domain-specific image augmentations and a weakly-supervised training protocol to further improve results. We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.

Paper Structure

This paper contains 16 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Clockwise: 1) a frame of the raw ARIS file; 2) the same frame after applying background subtraction with a minimum positive threshold on each pixel intensity of $\alpha_0=10$ above the mean frame; 3) the same frame after applying connected components analysis, applying background subtraction with a threshold of $\alpha_2=127$ outside the largest connected components and a threshold of $\alpha_1=35$ inside the largest connected components. 4) Selected frames from a time range in an ARIS file and 5) the same length of time displayed in echogram view, where the color corresponds to the lateral position of the brightest pixel.
  • Figure 2: Mean and standard deviation of total predicted counts vs total ground truth counts per clip on the KL-val and KR test sets. Size of the dot corresponds to the number of images with the associated ground truth count. The model systematically predicts lower counts than ground truth for KR clips with large numbers of fish, where tracks of distinct fish may overlap and become difficult to distinguish on the echogram.
  • Figure 3: Left: depiction of the horizontal plane of a multi-beam sonar configuration; right: camera placement on the left and right banks of the Kenai river in Alaska Key2017.
  • Figure 4: ARIS display software used by sonar technicians showing the echogram view and corresponding frame in sonar video Key2017.