Table of Contents
Fetching ...

A Methodology to Evaluate Strategies Predicting Rankings on Unseen Domains

Sébastien Piérard, Adrien Deliège, Anaïs Halin, Marc Van Droogenbroeck

TL;DR

This paper tackles the problem of predicting, for unseen domains, how a set of methods will rank relative to each other without new evaluations. It introduces a leave-one-domain-out methodology grounded in performance-based rankings and utilizes the Tile visualization to map application-specific preferences (parameters $a$ and $b$) to ranking outcomes. The authors apply the framework to background subtraction across 53 CDnet 2014 videos with 40 unsupervised methods, comparing multiple strategies (including CDnet baselines and semantically informed approaches) and demonstrating that the best strategy depends on the chosen preference, with hybrid and category-aware approaches often performing best. The work provides a rigorous evaluation tool for ranking-prediction strategies and offers a practical path toward selecting suitable methods for new domains without costly re-evaluation, with potential generalization beyond BGS to other cross-domain ranking problems.

Abstract

Frequently, multiple entities (methods, algorithms, procedures, solutions, etc.) can be developed for a common task and applied across various domains that differ in the distribution of scenarios encountered. For example, in computer vision, the input data provided to image analysis methods depend on the type of sensor used, its location, and the scene content. However, a crucial difficulty remains: can we predict which entities will perform best in a new domain based on assessments on known domains, without having to carry out new and costly evaluations? This paper presents an original methodology to address this question, in a leave-one-domain-out fashion, for various application-specific preferences. We illustrate its use with 30 strategies to predict the rankings of 40 entities (unsupervised background subtraction methods) on 53 domains (videos).

A Methodology to Evaluate Strategies Predicting Rankings on Unseen Domains

TL;DR

This paper tackles the problem of predicting, for unseen domains, how a set of methods will rank relative to each other without new evaluations. It introduces a leave-one-domain-out methodology grounded in performance-based rankings and utilizes the Tile visualization to map application-specific preferences (parameters and ) to ranking outcomes. The authors apply the framework to background subtraction across 53 CDnet 2014 videos with 40 unsupervised methods, comparing multiple strategies (including CDnet baselines and semantically informed approaches) and demonstrating that the best strategy depends on the chosen preference, with hybrid and category-aware approaches often performing best. The work provides a rigorous evaluation tool for ranking-prediction strategies and offers a practical path toward selecting suitable methods for new domains without costly re-evaluation, with potential generalization beyond BGS to other cross-domain ranking problems.

Abstract

Frequently, multiple entities (methods, algorithms, procedures, solutions, etc.) can be developed for a common task and applied across various domains that differ in the distribution of scenarios encountered. For example, in computer vision, the input data provided to image analysis methods depend on the type of sensor used, its location, and the scene content. However, a crucial difficulty remains: can we predict which entities will perform best in a new domain based on assessments on known domains, without having to carry out new and costly evaluations? This paper presents an original methodology to address this question, in a leave-one-domain-out fashion, for various application-specific preferences. We illustrate its use with 30 strategies to predict the rankings of 40 entities (unsupervised background subtraction methods) on 53 domains (videos).

Paper Structure

This paper contains 16 sections, 1 equation, 20 figures.

Figures (20)

  • Figure 1: In this paper, we explore the problem of predicting the rankings of computer vision methods (we take the particular case of background subtraction methods) on any new domain (video) based on a database storing the performances of these methods, previously evaluated on other domains (videos).
  • Figure 2: Two equivalent readings of the Tile: a map of application-specific importances (left) and a map of scores to induce meaningful performance-based rankings (right).
  • Figure 3: Illustration of our methodology for the evaluation and comparison of $3$ strategies to predict the rankings of $40$ BGS methods on the video "bad weather: blizzard". We use Tiles Pierard2024TheTile-arxiv to cover the different application-specific preferences $(a,b)$. The upper row shows: (a) the predicted ranking based just on the global ranking given on changedetection.net (strategy $\mathrm{CDnet}$), (b) the one based on the performance measured on another semantically close video (strategy $\mathrm{sem-d}^\dagger$), (c) the one based on the performance measured on another semantically close video in the same category (strategy $\mathrm{sem-d}^{\dagger*}$), and (d) the actual ranking we would like to predict (our ground truth). These "mille-feuilles" are stackings of entity Tiles Halin2024AHitchhikers-arxiv: the k-th layer shows the methods ranked k-th, the worst methods being at the base of the mille-feuille, and the best ones on its top. The lower row shows the correlation Tiles Halin2024AHitchhikers-arxiv between these rankings: (e) the correlation between (a) and (d), (f) the correlation between (b) and (d), and (g) the correlation between (c) and (d). The Tile (h) shows which strategy gives the best correlation. This methodology is applied in \ref{['sec:application']} to compare many more strategies on a diversified set of $53$ videos.
  • Figure 4: Interpretation of the rank correlation $\tau$ (Kendall). Note the color code for displaying the value of $\tau$, as is used in other figures.
  • Figure 5: The two natural baselines for ranking prediction strategies (as defined in our methodology) in the particular case of $40$ unsupervised BGS algorithms ranked on $53$ videos.
  • ...and 15 more figures