VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

Jan Held; Anthony Cioppa; Silvio Giancola; Abdullah Hamdi; Bernard Ghanem; Marc Van Droogenbroeck

VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

Jan Held, Anthony Cioppa, Silvio Giancola, Abdullah Hamdi, Bernard Ghanem, Marc Van Droogenbroeck

TL;DR

VARS tackles the need for scalable, accurate soccer officiating by automating foul-type classification and sanction prediction from multi-view broadcasts. It encodes per-view clips as $f_i = E_{ heta_E}(v_i)$, aggregates them into $R = A(igra f_i ig i_{i=1}^n)$, and outputs predictions via $VARS = \arg \max C_{ heta_C}(R)$, extended to a multi-task setup with $C^{foul}$ and $C^{off}$ and loss $\mathcal{L} = \alpha_{foul}\mathcal{L}^{foul} + \alpha_{off}\mathcal{L}^{off}$. The paper introduces SoccerNet-MVFouls, a dataset of $3{,}901$ actions from $500$ games across six leagues, annotated with 10 properties by a professional referee, and shows that multi-view VARS yields stronger performance than single-view baselines in both tasks. These results suggest VARS can provide real-time, consistent guidance to referees and broaden accessibility to amateur and semi-professional contexts, ultimately enhancing fairness and decision reliability.

Abstract

The Video Assistant Referee (VAR) has revolutionized association football, enabling referees to review incidents on the pitch, make informed decisions, and ensure fairness. However, due to the lack of referees in many countries and the high cost of the VAR infrastructure, only professional leagues can benefit from it. In this paper, we propose a Video Assistant Referee System (VARS) that can automate soccer decision-making. VARS leverages the latest findings in multi-view video analysis, to provide real-time feedback to the referee, and help them make informed decisions that can impact the outcome of a game. To validate VARS, we introduce SoccerNet-MVFoul, a novel video dataset of soccer fouls from multiple camera views, annotated with extensive foul descriptions by a professional soccer referee, and we benchmark our VARS to automatically recognize the characteristics of these fouls. We believe that VARS has the potential to revolutionize soccer refereeing and take the game to new heights of fairness and accuracy across all levels of professional and amateur federations.

VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

TL;DR

VARS tackles the need for scalable, accurate soccer officiating by automating foul-type classification and sanction prediction from multi-view broadcasts. It encodes per-view clips as

, aggregates them into

, and outputs predictions via

, extended to a multi-task setup with

and

and loss

. The paper introduces SoccerNet-MVFouls, a dataset of

actions from

games across six leagues, annotated with 10 properties by a professional referee, and shows that multi-view VARS yields stronger performance than single-view baselines in both tasks. These results suggest VARS can provide real-time, consistent guidance to referees and broaden accessibility to amateur and semi-professional contexts, ultimately enhancing fairness and decision reliability.

Abstract

Paper Structure (20 sections, 7 equations, 9 figures, 9 tables)

This paper contains 20 sections, 7 equations, 9 figures, 9 tables.

Introduction
Related work
SoccerNet-MVFouls dataset
Dataset collection
Dataset statistics
Methodology
Classification tasks
Video Assistant Referee System (VARS)
Experiments
Experimental setup
Main Results
Detailed analysis
Conclusion
Supplementary
Video assistant referee system software
...and 5 more sections

Figures (9)

Figure 1: Video Assistant Referee System (VARS). We propose an automated VARS for automatically classifying whether an action is a foul, determining the type of foul (e.g., 'Tackling', 'Pushing', etc.), and the appropriate punishment the player should receive for the foul (i.e., 'No card', 'Yellow card', or 'Red card') from a multi-view camera setup.
Figure 2: Example of a multi-view sequence from our dataset. Each foul has at least (a) one live-action clip (usually taken from the main camera) and (b) one synchronized replay clip (usually a close-up view). We annotate the exact frame where the point of contact happens (red box). The ground-truth properties for this example are: "Offence", "Challenge", "No card", "With contact", "Upper body", "Use of shoulder", "Ball is not played", "Tried to play the ball", "No handball", and "No handball offence".
Figure 3: VARS: Video Assistant Referee System. From multi-view video clips input, our system encodes per-view video features ($\mathbf{E}$), aggregates the view features ($\mathbf{A}$), and classifies different properties of the foul action ($\mathbf{C}$).
Figure 4: Example of fouls. (a) The defender uses his arm as a tool to gain an unfair advantage and ignores the potential danger for his opponent. (b) The defender makes a tackle while taking the risk of his opponent being injured. (c) The defender tries to play the ball in no dangerous way. (d) The defender has no intention to play the ball and only aims to harm his opponent.
Figure 5: Qualitative results. VARS predictions for different combinations of views as input.The best performance is obtained with the two replay views.
...and 4 more figures

VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

TL;DR

Abstract

VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

Authors

TL;DR

Abstract

Table of Contents

Figures (9)