Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

Shuguang Wang; Yuanjing Wang

Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

Shuguang Wang, Yuanjing Wang

TL;DR

Task-Driven Fixation Network (TDFN) addresses the efficiency gap in high-resolution visual processing by integrating a fixation-driven mechanism into a Transformer-based architecture. It combines a Low-Resolution Channel (LRC), a High-Resolution Channel (HRC), and a Hybrid Encoder (HE) connected via a Fixation Point Generator (FPG) that selects regions of interest; fixation points are obtained through Monte Carlo sampling of a saliency map produced from the rec_token. Training proceeds in two phases: initial task learning with random fixations and a subsequent reinforcement-learning update of the FPG using rewards $Reward_{n} = TaskLoss_{n-1} - TaskLoss_{n}$ and $L_{n} = -Reward_{n} \, \cdot \, \log(p_{n})$, with the objective $TaskLoss = ClassLoss + \alpha \cdot ReconLoss$. Experiments on MNIST show that selective high-resolution analysis substantially improves classification accuracy while keeping coverage and computation low, and dynamic termination further reduces fixation steps. Overall, TDFN demonstrates a scalable, task-specific approach that leverages fixation-inspired attention to balance performance and efficiency in vision tasks, with potential extensions to detection and segmentation.

Abstract

This paper presents a novel neural network architecture featuring automatic fixation point selection, designed to efficiently address complex tasks with reduced network size and computational overhead. The proposed model consists of: a low-resolution channel that captures low-resolution global features from input images; a high-resolution channel that sequentially extracts localized high-resolution features; and a hybrid encoding module that integrates the features from both channels. A defining characteristic of the hybrid encoding module is the inclusion of a fixation point generator, which dynamically produces fixation points, enabling the high-resolution channel to focus on regions of interest. The fixation points are generated in a task-driven manner, enabling the automatic selection of regions of interest. This approach avoids exhaustive high-resolution analysis of the entire image, maintaining task performance and computational efficiency.

Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

TL;DR

and

, with the objective

. Experiments on MNIST show that selective high-resolution analysis substantially improves classification accuracy while keeping coverage and computation low, and dynamic termination further reduces fixation steps. Overall, TDFN demonstrates a scalable, task-specific approach that leverages fixation-inspired attention to balance performance and efficiency in vision tasks, with potential extensions to detection and segmentation.

Abstract

Paper Structure (13 sections, 4 equations, 2 figures, 2 tables)

This paper contains 13 sections, 4 equations, 2 figures, 2 tables.

Introduction
Model
Information Transfer Between Modules
Task Networks
Fixation Point Generator
Training Strategy
Positional Encoding and Channel Encoding
Experiments
Dataset and Parameter Settings
Effectiveness of the Fixation Mechanism
Dynamic Termination of Fixation
Visualization of Fixation Points
Conclusion

Figures (2)

Figure 1: TDFN architecture.
Figure 2: Visualization of Fixation Points. The first column shows the original input images. The second column displays the low-resolution images (odd rows) and the reconstructed images generated by TDFN using only the low-resolution inputs (even rows). The third to ninth columns sequentially present the fixation points generated by the FPG (odd rows, represented as light squares) and the reconstructed images generated by TDFN using both low-resolution inputs and high-resolution inputs from the fixation points (even rows). The reconstructed images are displayed to illustrate how the addition of fixation points introduces supplementary information.

Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

TL;DR

Abstract

Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (2)