Table of Contents
Fetching ...

Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy

Katarzyna Janocha, Annabel Ling, Alice Godson, Yulia Lampi, Simon Bornschein, Nils Y. Hammerla

TL;DR

The paper tackles the data bottleneck in applying machine learning to CAR hit maturation by generating a high-throughput, standardized dataset of CAR performance and fine-tuning ProGen2 with preference-based objectives. It experiments with Direct Preference Optimisation (DPO) using sigmoid, hinge, and Kahneman-Tversky losses, identifying Kahneman-Tversky Optimization as the most effective for guiding maturation, and demonstrates that model loss correlates with biologically measured activation ($ ext{measured as } ext{$\Delta$GFP}}$). Through greedy and exhaustive single/double-mutant searches, the approach discovers high-performing variants and shows that exhaustive search yields stronger enrichment, indicating data-efficient exploration around candidate CARs. The findings establish a feasible ML-to-immunotherapy workflow and suggest the method could generalise to other therapeutic modalities, enabling more efficient hit maturation under limited high-fidelity data.

Abstract

Cell and immunotherapy offer transformative potential for treating diseases like cancer and autoimmune disorders by modulating the immune system. The development of these therapies is resource-intensive, with the majority of drug candidates failing to progress beyond laboratory testing. While recent advances in machine learning have revolutionised areas such as protein engineering, applications in immunotherapy remain limited due to the scarcity of large-scale, standardised datasets and the complexity of cellular systems. In this work, we address these challenges by leveraging a high-throughput experimental platform to generate data suitable for fine-tuning protein language models. We demonstrate how models fine-tuned using a preference task show surprising correlations to biological assays, and how they can be leveraged for few-shot hit maturation in CARs. This proof-of-concept presents a novel pathway for applying ML to immunotherapy and could generalise to other therapeutic modalities.

Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy

TL;DR

The paper tackles the data bottleneck in applying machine learning to CAR hit maturation by generating a high-throughput, standardized dataset of CAR performance and fine-tuning ProGen2 with preference-based objectives. It experiments with Direct Preference Optimisation (DPO) using sigmoid, hinge, and Kahneman-Tversky losses, identifying Kahneman-Tversky Optimization as the most effective for guiding maturation, and demonstrates that model loss correlates with biologically measured activation (\Delta). Through greedy and exhaustive single/double-mutant searches, the approach discovers high-performing variants and shows that exhaustive search yields stronger enrichment, indicating data-efficient exploration around candidate CARs. The findings establish a feasible ML-to-immunotherapy workflow and suggest the method could generalise to other therapeutic modalities, enabling more efficient hit maturation under limited high-fidelity data.

Abstract

Cell and immunotherapy offer transformative potential for treating diseases like cancer and autoimmune disorders by modulating the immune system. The development of these therapies is resource-intensive, with the majority of drug candidates failing to progress beyond laboratory testing. While recent advances in machine learning have revolutionised areas such as protein engineering, applications in immunotherapy remain limited due to the scarcity of large-scale, standardised datasets and the complexity of cellular systems. In this work, we address these challenges by leveraging a high-throughput experimental platform to generate data suitable for fine-tuning protein language models. We demonstrate how models fine-tuned using a preference task show surprising correlations to biological assays, and how they can be leveraged for few-shot hit maturation in CARs. This proof-of-concept presents a novel pathway for applying ML to immunotherapy and could generalise to other therapeutic modalities.

Paper Structure

This paper contains 17 sections, 4 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: CAR structure with scFV and VHH binding domains (see text for details).
  • Figure 2: Simplified candidate generation pipeline. Using phage display based on a highly diverse starting library we screen for binders to the target of interest using phage display. Resulting candidates are evaluated in cells in scalable proprietary assays.
  • Figure 3: Encoding of a pair of chosen and rejected completion for a context prompt. Context- and chosen CDR3s are sampled from good performers for a specific target, rejected CDR3s from poor performers for the same target. We produce up to $n=10$ different pairs for each good CDR3.
  • Figure 4: Comparison between rewards, the mean difference between the log probabilities of $\pi_\theta$ and $\pi_\text{st}$ for rejected and chosen completions, and margins between them, for models trained using the three analysed loss functions (best performing hyperparameters were chosen for each loss function). Models trained using the original sigmoid-based loss penalise the rejected completions heavily, and tend to become overly confident in their decisions, making them more susceptible to overfitting and numerical instability. They could perhaps benefit from more sophisticated regularisation mechanisms. Models trained with hinge loss maintain very small margins, which may explain their tendency to produce trivial completions, despite promising validation loss and accuracies. It is possible that in the future iterations, after construction of datasets with harder examples robinson2021contrastive, models using this loss have some potential to yield useful results due to their rapid convergence and stability. KTO's behaviour appears to be best aligned with our desired use-case of hit maturation, as it increases the likelihood of the chosen completion instead of penalising the rejected completion, all while achieving high margins.
  • Figure 5: Model loss vs activation for greedily generated candidates. All plots show model loss averaged over all possible context permutations, plotted against activation measured as $\Delta$GFP. Dots indicate candidates with a single mutation, crosses two mutations, black lines indicate baseline performance of each candidate. Overall we see strong correlation between averaged model loss and activation.
  • ...and 4 more figures