Table of Contents
Fetching ...

It Takes So Little to Change So Much: Investigating the Robustness of a Danish Voting Advice Algorithm

Giovanni Astante, Roberta Sinatra, Vedran Sekara

Abstract

Voting Advice Applications (VAA) are tools designed to help voters compare political candidates on policy preferences prior to elections. VAAs are popular tools in European countries and in other countries with multi-party democratic systems. Through a freedom of information request we got access to the inner workings of a popular Danish VAA called the Kandidattest which is implemented by major Danish news outlet and has been used for general, municipal, and European elections. Users and politicians from every political party answer the same online questionnaire and get matched based on the agreement percentage stemming from their answers. VAAs play a significant role in elections with 45% of surveyed voters reporting they followed its recommendations in the past Danish general election, however, the inner workings of VAAs have not been thoroughly evaluated. We find that the algorithm is not robust enough for users to trust the agreement percentages in the output, as small changes to the algorithm can lead to different results, potentially affecting election results. We conduct an algorithmic audit of the Kandidattest's robustness, using simulated responses to investigate the tool's brittleness, with respect to minor adjustments of the algorithm's weight, and changes in the number of questions of the questionnaire.

It Takes So Little to Change So Much: Investigating the Robustness of a Danish Voting Advice Algorithm

Abstract

Voting Advice Applications (VAA) are tools designed to help voters compare political candidates on policy preferences prior to elections. VAAs are popular tools in European countries and in other countries with multi-party democratic systems. Through a freedom of information request we got access to the inner workings of a popular Danish VAA called the Kandidattest which is implemented by major Danish news outlet and has been used for general, municipal, and European elections. Users and politicians from every political party answer the same online questionnaire and get matched based on the agreement percentage stemming from their answers. VAAs play a significant role in elections with 45% of surveyed voters reporting they followed its recommendations in the past Danish general election, however, the inner workings of VAAs have not been thoroughly evaluated. We find that the algorithm is not robust enough for users to trust the agreement percentages in the output, as small changes to the algorithm can lead to different results, potentially affecting election results. We conduct an algorithmic audit of the Kandidattest's robustness, using simulated responses to investigate the tool's brittleness, with respect to minor adjustments of the algorithm's weight, and changes in the number of questions of the questionnaire.
Paper Structure (12 sections, 3 equations, 5 figures, 1 table)

This paper contains 12 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Characteristics of Kandidattest and possible different outcomes. a, Visual overview of the matching algorithm. A user fills out $m$ questions on a webpage and the matching algorithm compares their answers to answers by political candidates. A ranked list is returned to the web server which presents a truncated list of these to the user. b, Visual explanation of possible different outcomes. When modifying the algorithm we compare the outcomes to the original version of the Kandidattest. If any of the three possible outcomes occur we say a marginal modification in the algorithm has resulted in a significantly different outcome.
  • Figure 2: Modifications we perform on Kandidattest. a, Original algorithmic weights split up according to the perceived importance of a question. b, Single modification. Here $w(\textrm{d = 2},\textrm{I = not important)}$ has been modified by addition of $\delta = 1$. c, Importance weight modification, where all weights for questions marked with a specific importance type are adjusted by $\delta$. Here weighs for questions marked as 'neutral' are modified by $\delta=1$. d, Overall modification, where all weights are incremented by addition of $\delta$. e, Removing questions, where $n=2$ random questions have been removed when computing the agreement scores.
  • Figure 3: Percentage of changed outcomes as a result of single modifications. Results are split up according to importance of questions and different $\delta$ values. Left: questions marked as important by users, middle: neutral, right: not important. a, Results for changed outcomes at candidate level. Error bars denote 95% confidence intervals across different batches. b, Results at political party level.
  • Figure 4: Estimating the robustness of top-$k$ ranked candidates in response to a single modification of the algorithm's weights. We focus on $k$-values from $3$ to $15$. Results are split up according to question importance-type, and shows changes at candidate level. Error bars denote 95% confidence intervals across different batches.
  • Figure 5: Impact of removing $n$ questions from the agreement calculation. We split the analysis up into different question importance-types. Error bars denote 95% confidence intervals across different batches. a, Results at candidate level. b, Results at party level.