Open-ended generation tasks require outputs to satisfy diverse and often
implicit task-specific evaluation rubrics. The sheer number of relevant rubrics
leads to prohibitively high verification costs and incomplete assessments of a
response, making reinforcement learning (RL) post-training with rubric-based
rewards difficult to scale. This problem is exacerbated by the fact that often
the best way to combine these rubrics into one single reward is also highly
prompt-specific. We propose Reinforcement Learning with Adversarial Critic
(RLAC), a post-training approach that addresses these challenges via dynamic
rubric verification. Our approach employs a large language model (LLM) as a
critic that dynamically identifies only the most likely failure modes (e.g., a
factual error or unhandled edge case), which are then verified by an external
validator to optimize both generator and critic jointly. By training both the
generator and the critic, this game enhances the critic's error detection and
the generator's output quality while reducing required verifications. Our
experiments demonstrate that RLAC improves factual accuracy in text generation
and correctness in code generation, while also outperforming exhaustive
verification and reward model methods. We show that dynamic critics are more
effective than fixed critics, showcasing the potential of RLAC for scaling RL
post-training to free-form generation tasks.