Table of Contents
Fetching ...

Discriminative Flow Matching Via Local Generative Predictors

Om Govind Jha, Manoj Bamniya, Ayon Borthakur

Abstract

Traditional discriminative computer vision relies predominantly on static projections, mapping input features to outputs in a single computational step. Although efficient, this paradigm lacks the iterative refinement and robustness inherent in biological vision and modern generative modelling. In this paper, we propose Discriminative Flow Matching, a framework that reformulates classification and object detection as a conditional transport process. By learning a vector field that continuously transports samples from a simple noise distribution toward a task-aligned target manifold -- such as class embeddings or bounding box coordinates -- we are at the interface between generative and discriminative learning. Our method attaches multiple independent flow predictors to a shared backbone. These predictors are trained using local flow matching objectives, where gradients are computed independently for each block. We formulate this approach for standard image classification and extend it to the complex task of object detection, where targets are high-dimensional and spatially distributed. This architecture provides the flexibility to update blocks either sequentially to minimise activation memory or in parallel to suit different hardware constraints. By aggregating the predictions from these independent flow predictors, our framework enables robust, generative-inspired inference across diverse architectures, including CNNs and vision transformers.

Discriminative Flow Matching Via Local Generative Predictors

Abstract

Traditional discriminative computer vision relies predominantly on static projections, mapping input features to outputs in a single computational step. Although efficient, this paradigm lacks the iterative refinement and robustness inherent in biological vision and modern generative modelling. In this paper, we propose Discriminative Flow Matching, a framework that reformulates classification and object detection as a conditional transport process. By learning a vector field that continuously transports samples from a simple noise distribution toward a task-aligned target manifold -- such as class embeddings or bounding box coordinates -- we are at the interface between generative and discriminative learning. Our method attaches multiple independent flow predictors to a shared backbone. These predictors are trained using local flow matching objectives, where gradients are computed independently for each block. We formulate this approach for standard image classification and extend it to the complex task of object detection, where targets are high-dimensional and spatially distributed. This architecture provides the flexibility to update blocks either sequentially to minimise activation memory or in parallel to suit different hardware constraints. By aggregating the predictions from these independent flow predictors, our framework enables robust, generative-inspired inference across diverse architectures, including CNNs and vision transformers.
Paper Structure (64 sections, 44 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 64 sections, 44 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the proposed Discriminative Flow Matching framework
  • Figure 2: Training progress comparison on PASCAL VOC using ResNet-50. Both validation mAP and training loss are shown over 50 epochs.
  • Figure 3: Memory consumed (in MB) versus the number of decoders for FM and Backprop methods on Pascal Object Detection Tasks for batch size 8
  • Figure 4: Fine-tuning progress comparison on CIFAR-100 using a ViT model. Standard Backpropagation (left and center) is compared against Flow Matching (right) over 10 epochs.
  • Figure 5: Representation Quality Benchmark comparing Stochastic Forward-Forward (FF), Backpropagation (BP), and Flow Matching (Flow) across CNN and ViT architectures.
  • ...and 2 more figures