Designing High-Performing Networks for Multi-Scale Computer Vision

Cédric Picron

Designing High-Performing Networks for Multi-Scale Computer Vision

Cédric Picron

TL;DR

The work investigates high-performance network designs for multi-scale computer vision, introducing a dedicated neck (TPN) and task-specific heads (FQDet, FQDetV2, EffSeg) to address scale variation more efficiently than backbone-centric approaches. By balancing communication-based and self-processing in the neck, reintroducing anchors and static top-k matching in query-based detectors, and applying Structure-Preserving Sparsity for fine-grained segmentation, the paper demonstrates improved accuracy and faster convergence on COCO benchmarks while maintaining competitive computation. Key findings show that allocating more compute to necks yields tangible gains and that anchor-informed, top-k–based matching accelerates training and improves localization; EffSeg delivers strong segmentation performance with substantial FLOP and memory savings. Collectively, these designs advance multi-scale CV by shifting emphasis toward neck and task-head innovations, with broad implications for object detection and segmentation in real-world, resource-constrained settings.

Abstract

Since the emergence of deep learning, the computer vision field has flourished with models improving at a rapid pace on more and more complex tasks. We distinguish three main ways to improve a computer vision model: (1) improving the data aspect by for example training on a large, more diverse dataset, (2) improving the training aspect by for example designing a better optimizer, and (3) improving the network architecture (or network for short). In this thesis, we chose to improve the latter, i.e. improving the network designs of computer vision models. More specifically, we investigate new network designs for multi-scale computer vision tasks, which are tasks requiring to make predictions about concepts at different scales. The goal of these new network designs is to outperform existing baseline designs from the literature. Specific care is taken to make sure the comparisons are fair, by guaranteeing that the different network designs were trained and evaluated with the same settings. Code is publicly available at https://github.com/CedricPicron/DetSeg.

Designing High-Performing Networks for Multi-Scale Computer Vision

TL;DR

Abstract

Paper Structure (187 sections, 5 equations, 34 figures, 30 tables)

This paper contains 187 sections, 5 equations, 34 figures, 30 tables.

Abstract
Beknopte samenvatting
Introduction
Research setting
Computer vision.
Deep learning.
Computer vision networks.
General research goal
Thesis overview and contributions
Chapter \ref{['ch:background']}.
Chapter \ref{['ch:neck']}.
Chapter \ref{['ch:detection']}.
Chapter \ref{['ch:segmentation']}.
Chapter \ref{['ch:conclusion']}.
Background
...and 172 more sections

Figures (34)

Figure 1: Example object-centric (left) and scene-centric (right) images.
Figure 2: Overview of the research setting.
Figure 3: The general research goal is to design networks that lie top left (i.e. in the green region) of the baseline network in the performance vs. cost graph.
Figure 4: High-level view of a multi-scale computer vision network following the backbone-neck-head meta architecture. The backbone (left) processes the image to output a feature pyramid. The neck (middle) takes in a feature pyramid (denoted by FP) and returns an updated feature pyramid. Finally, the task-specific head (right) produces the loss during training and makes predictions during inference from the final feature pyramid.
Figure 5: Example image with the (ground-truth) object detection annotations.
...and 29 more figures

Designing High-Performing Networks for Multi-Scale Computer Vision

TL;DR

Abstract

Designing High-Performing Networks for Multi-Scale Computer Vision

Authors

TL;DR

Abstract

Table of Contents

Figures (34)