NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price
TL;DR
NanoFlux presents a fully automatic adversarial framework where dual LLMs (Attacker and Defender) generate targeted, multi-hop reasoning questions under the supervision of a tool-augmented Judge. By constraining synthesis to roughly 200 examples per domain, it outperforms traditional full-dataset fine-tuning across GSMHard, GenomeBench, and MultiMedQA while achieving substantial compute savings. The approach leverages embedding-based novelty filtering and domain-specific judge tooling to create high-information training signals and diverse reasoning patterns, yielding domain-agnostic gains and revealing non-monotonic relationships between dataset characteristics and model performance. The work suggests that intelligently synthesized, small training sets can dramatically improve reasoning capabilities with far greater data efficiency than large-scale data collection.
Abstract
We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets.
