Training Image Derivatives: Increased Accuracy and Universal Robustness
Vsevolod I. Avrutskiy
TL;DR
This work demonstrates that training neural networks with respect to the derivatives of the input manifold—specifically the cube’s six degrees of freedom—substantially enhances both accuracy and robustness. By deriving and incorporating first- and second-order image derivatives, and by introducing an oracle-based Taylor expansion for robustness analysis, the authors achieve up to a 25-fold improvement in noiseless image tasks and robust performance beyond the manifold, while enabling universal robust training through random directions and Hessian-aligned objectives. The approach unifies sensitivity- and invariance-based adversarial attacks via a tangent-space framework and shows that robustness can be improved without sacrificing accuracy, particularly with increased network capacity. The methodology has potential applications in phase retrieval and other problems where a smooth, computable manifold parametrization exists, and it suggests a practical path toward robust, high-accuracy models in structured inverse problems.
Abstract
Derivative training is an established method that can significantly increase the accuracy of neural networks in certain low-dimensional tasks. In this paper, we extend this improvement to an illustrative image analysis problem: reconstructing the vertices of a cube from its image. By training the derivatives with respect to the cube's six degrees of freedom, we achieve a 25-fold increase in accuracy for noiseless inputs. Additionally, derivative knowledge offers a novel approach to enhancing network robustness, which has traditionally been understood in terms of two types of vulnerabilities: excessive sensitivity to minor perturbations and failure to detect significant image changes. Conventional robust training relies on output invariance, which inherently creates a trade-off between these two vulnerabilities. By leveraging derivative information we compute non-trivial output changes in response to arbitrary input perturbations. This resolves the trade-off, yielding a network that is twice as robust and five times more accurate than the best case under the invariance assumption. Unlike conventional robust training, this outcome can be further improved by simply increasing the network capacity. This approach is applicable to phase retrieval problems and other scenarios where a sufficiently smooth manifold parametrization can be obtained.
