Table of Contents
Fetching ...

Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety

Trent R Northen, Mingxun Wang

TL;DR

QLoRA fine-tuning significantly increased the scoring of biological solutions for both models without degrading general capabilities, suggesting that even a small amount of fine-tuning can change how models weigh the relative value of biological and bioinspired vs. synthetic approaches.

Abstract

Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains (materials, energy, manufacturing, and algorithms). A sample of 5 frontier and 5 open-weight models were measured using 50 curated Bioalignment prompts with a Kelly criterion-inspired evaluation framework. According to this metric, most models were not bioaligned in that they exhibit biases in favor of synthetic (non-biological) solutions. We next examined if fine-tuning could increase the preferences of two open-weight models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, for biological-based approaches. A curated corpus of ~22M tokens from 6,636 PMC articles emphasizing biological problem-solving was used first to fine-tune Llama 3B with a mixed corpus of continued training and instruction-formatted. This was then extended to Qwen 3B using instruction-formatted only. We found that QLoRA fine-tuning significantly increased the scoring of biological solutions for both models without degrading general capabilities (Holm-Bonferroni-corrected p < 0.001 and p < 0.01, respectively). This suggests that even a small amount of fine-tuning can change how models weigh the relative value of biological and bioinspired vs. synthetic approaches. Although this work focused on small open-weight LLMs, it may be extensible to much larger models and could be used to develop models that favor bio-based approaches. We release the benchmark, corpus, code, and adapter weights.

Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety

TL;DR

QLoRA fine-tuning significantly increased the scoring of biological solutions for both models without degrading general capabilities, suggesting that even a small amount of fine-tuning can change how models weigh the relative value of biological and bioinspired vs. synthetic approaches.

Abstract

Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains (materials, energy, manufacturing, and algorithms). A sample of 5 frontier and 5 open-weight models were measured using 50 curated Bioalignment prompts with a Kelly criterion-inspired evaluation framework. According to this metric, most models were not bioaligned in that they exhibit biases in favor of synthetic (non-biological) solutions. We next examined if fine-tuning could increase the preferences of two open-weight models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, for biological-based approaches. A curated corpus of ~22M tokens from 6,636 PMC articles emphasizing biological problem-solving was used first to fine-tune Llama 3B with a mixed corpus of continued training and instruction-formatted. This was then extended to Qwen 3B using instruction-formatted only. We found that QLoRA fine-tuning significantly increased the scoring of biological solutions for both models without degrading general capabilities (Holm-Bonferroni-corrected p < 0.001 and p < 0.01, respectively). This suggests that even a small amount of fine-tuning can change how models weigh the relative value of biological and bioinspired vs. synthetic approaches. Although this work focused on small open-weight LLMs, it may be extensible to much larger models and could be used to develop models that favor bio-based approaches. We release the benchmark, corpus, code, and adapter weights.
Paper Structure (32 sections, 3 equations, 4 figures, 7 tables)

This paper contains 32 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Bioalignment scores across 10 models. Bars show $\Delta p_{up}$ (bioalignment metric). Blue indicates pro-biological ($>$+0.05), gray indicates neutral ($\pm$0.05), and red indicates pro-synthetic ($<$-0.05). Frontier models are shown in bold. Claude Opus 4.5 shows the strongest pro-biological disposition; Gemini 2.0 Flash shows pro-synthetic bias comparable to small open-weight models. Gemma 7B was excluded due to a 46% parse rate.
  • Figure 2: Bias reduction after QLoRA fine-tuning. Llama 3B shifts by $+0.132$ ($\Delta p_{up}$: $-0.141 \to -0.009$, $p < 0.001$). Qwen 3B shifts by $+0.054$ ($-0.111 \to -0.057$, $p < 0.01$), demonstrating cross-architecture generalization.
  • Figure 3: Bioalignment trajectory during training (Llama 3B). Phase 1 shows rapid correction from pro-synthetic bias toward neutrality within 200 steps. Phase 2 exhibits oscillation around the neutral zone with a plateau mean of $\Delta p_{up} = +0.007$ ($\text{SD} = 0.036$, steps 200--1100).
  • Figure 4: Effect of bioaligned training in context of all models. Arrows show the shift from base models (hatched red bars) to bioaligned versions (green bars). Llama 3B shifts by $+0.132$, moving from pro-synthetic to neutral. Qwen 3B shifts by $+0.054$, reducing pro-synthetic bias.