Table of Contents
Fetching ...

Beyond the Resumé: A Rubric-Aware Automatic Interview System for Information Elicitation

Harry Stuart, Masahiro Kaneko, Timothy Baldwin

TL;DR

This work proposes that large language models (LLMs) can play the role of subject matter experts to cost-effectively elicit information from each candidate that is nuanced and role-specific, thereby improving the quality of early-stage hiring decisions.

Abstract

Effective hiring is integral to the success of an organisation, but it is very challenging to find the most suitable candidates because expert evaluation (e.g.\ interviews conducted by a technical manager) are expensive to deploy at scale. Therefore, automated resume scoring and other applicant-screening methods are increasingly used to coarsely filter candidates, making decisions on limited information. We propose that large language models (LLMs) can play the role of subject matter experts to cost-effectively elicit information from each candidate that is nuanced and role-specific, thereby improving the quality of early-stage hiring decisions. We present a system that leverages an LLM interviewer to update belief over an applicant's rubric-oriented latent traits in a calibrated way. We evaluate our system on simulated interviews and show that belief converges towards the simulated applicants' artificially-constructed latent ability levels. We release code, a modest dataset of public-domain/anonymised resumes, belief calibration tests, and simulated interviews, at \href{https://github.com/mbzuai-nlp/beyond-the-resume}{https://github.com/mbzuai-nlp/beyond-the-resume}. Our demo is available at \href{https://btr.hstu.net}{https://btr.hstu.net}.

Beyond the Resumé: A Rubric-Aware Automatic Interview System for Information Elicitation

TL;DR

This work proposes that large language models (LLMs) can play the role of subject matter experts to cost-effectively elicit information from each candidate that is nuanced and role-specific, thereby improving the quality of early-stage hiring decisions.

Abstract

Effective hiring is integral to the success of an organisation, but it is very challenging to find the most suitable candidates because expert evaluation (e.g.\ interviews conducted by a technical manager) are expensive to deploy at scale. Therefore, automated resume scoring and other applicant-screening methods are increasingly used to coarsely filter candidates, making decisions on limited information. We propose that large language models (LLMs) can play the role of subject matter experts to cost-effectively elicit information from each candidate that is nuanced and role-specific, thereby improving the quality of early-stage hiring decisions. We present a system that leverages an LLM interviewer to update belief over an applicant's rubric-oriented latent traits in a calibrated way. We evaluate our system on simulated interviews and show that belief converges towards the simulated applicants' artificially-constructed latent ability levels. We release code, a modest dataset of public-domain/anonymised resumes, belief calibration tests, and simulated interviews, at \href{https://github.com/mbzuai-nlp/beyond-the-resume}{https://github.com/mbzuai-nlp/beyond-the-resume}. Our demo is available at \href{https://btr.hstu.net}{https://btr.hstu.net}.
Paper Structure (28 sections, 13 equations, 5 figures, 4 tables)

This paper contains 28 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Judge's justification for updating belief over $\mathrm{ML\ Systems\ Delivery}$ dimension, in debasing (diminishing the supportive power of) previous evidence. Not derived from the same interview presented in \ref{['fig:hero']}.
  • Figure 2: Comparison of \ref{['eq:ctv']} over simulated interviews using a subset of $P=30$ profiles.
  • Figure 3: TV between judge's successive belief states $\Delta_t$ as described in \ref{['eq:successive-tv']} over the course of $P_\mathrm{full}=180$ simulated interviews.
  • Figure 4: Confusion matrix for archetype recovery over the full simulation set ($|\mathcal{P}_{\mathrm{full}}|=180$). Italicised archetypes are our previously defined anchor-archetypes \ref{['eq:anchor-archetypes']}.
  • Figure 5: Per-level misclassifcations across the full simulation set ($P_{\mathrm{full}}=180$).