Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy
Katarzyna Janocha, Annabel Ling, Alice Godson, Yulia Lampi, Simon Bornschein, Nils Y. Hammerla
TL;DR
The paper tackles the data bottleneck in applying machine learning to CAR hit maturation by generating a high-throughput, standardized dataset of CAR performance and fine-tuning ProGen2 with preference-based objectives. It experiments with Direct Preference Optimisation (DPO) using sigmoid, hinge, and Kahneman-Tversky losses, identifying Kahneman-Tversky Optimization as the most effective for guiding maturation, and demonstrates that model loss correlates with biologically measured activation ($ ext{measured as } ext{$\Delta$GFP}}$). Through greedy and exhaustive single/double-mutant searches, the approach discovers high-performing variants and shows that exhaustive search yields stronger enrichment, indicating data-efficient exploration around candidate CARs. The findings establish a feasible ML-to-immunotherapy workflow and suggest the method could generalise to other therapeutic modalities, enabling more efficient hit maturation under limited high-fidelity data.
Abstract
Cell and immunotherapy offer transformative potential for treating diseases like cancer and autoimmune disorders by modulating the immune system. The development of these therapies is resource-intensive, with the majority of drug candidates failing to progress beyond laboratory testing. While recent advances in machine learning have revolutionised areas such as protein engineering, applications in immunotherapy remain limited due to the scarcity of large-scale, standardised datasets and the complexity of cellular systems. In this work, we address these challenges by leveraging a high-throughput experimental platform to generate data suitable for fine-tuning protein language models. We demonstrate how models fine-tuned using a preference task show surprising correlations to biological assays, and how they can be leveraged for few-shot hit maturation in CARs. This proof-of-concept presents a novel pathway for applying ML to immunotherapy and could generalise to other therapeutic modalities.
