EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment
Ruoxi Cheng, Haoxuan Ma, Teng Ma, Hongyi Zhang
TL;DR
This work reframes LVLM alignment as an economic problem of budgeted reasoning, identifying process-deliberation as a major source of inefficiency. EcoAlign treats inference as a boundedly rational search over a dynamically built multimodal thought graph, using a forward-looking valuation Γ(P) that integrates safety, utility, and cost under a fixed budget, and enforcing safety via the weakest-link principle. The method introduces local returns, net present value of reasoning, and a dynamic lookahead horizon to guide action selection, with Pareto-frontier based path extraction to select the final solution. Empirical results across five models and six benchmarks show that EcoAlign matches or surpasses state-of-the-art safety and utility while substantially reducing inference cost, offering a principled and economical pathway to robust LVLM alignment.
Abstract
Large Vision-Language Models (LVLMs) exhibit powerful reasoning capabilities but suffer sophisticated jailbreak vulnerabilities. Fundamentally, aligning LVLMs is not just a safety challenge but a problem of economic efficiency. Current alignment methods struggle with the trade-off between safety, utility, and operational costs. Critically, a focus solely on final outputs (process-blindness) wastes significant computational budget on unsafe deliberation. This flaw allows harmful reasoning to be disguised with benign justifications, thereby circumventing simple additive safety scores. To address this, we propose EcoAlign, an inference-time framework that reframes alignment as an economically rational search by treating the LVLM as a boundedly rational agent. EcoAlign incrementally expands a thought graph and scores actions using a forward-looking function (analogous to net present value) that dynamically weighs expected safety, utility, and cost against the remaining budget. To prevent deception, path safety is enforced via the weakest-link principle. Extensive experiments across 3 closed-source and 2 open-source models on 6 datasets show that EcoAlign matches or surpasses state-of-the-art safety and utility at a lower computational cost, thereby offering a principled, economical pathway to robust LVLM alignment.
