Table of Contents
Fetching ...

ANML: Attribution-Native Machine Learning with Guaranteed Robustness

Oliver Zahn, Matt Beton, Simran Chana

TL;DR

ANML introduces attribution-native machine learning by weighting per-sample training contributions with four provenance-informed factors: gradient-consistency $q$, verification $v$, reputation $r$, and freshness $T$. It analyzes multiplicative and robust signal-combination strategies, proposing Two-Stage Adaptive Gating and Softmax Blend to ensure baseline performance and resilience to data-poisoning. Empirical validation across five datasets shows substantial improvements (e.g., +33% to +75% error reduction) and notable data-efficiency (20% high-quality data beating 100% uniformly weighted data by 47%), along with strong robustness to adversarial conditions and meaningful contributor-level attribution gains. The work highlights practical implications for training frontier AI on expert data, offering mechanisms to credit contributors and sustain data-sharing incentives while maintaining safe training dynamics, including a safety fallback when external signals are compromised.

Abstract

Frontier AI systems increasingly train on specialized expert data, from clinical records to proprietary research to curated datasets, yet current training pipelines treat all samples identically. A Nobel laureate's contribution receives the same weight as an unverified submission. We introduce ANML (Attribution-Native Machine Learning), a framework that weights training samples by four quality factors: gradient-based consistency (q), verification status (v), contributor reputation (r), and temporal relevance (T). By combining what the model observes (gradient signals) with what the system knows about data provenance (external signals), ANML produces per-contributor quality weights that simultaneously improve model performance and enable downstream attribution. Across 5 datasets (178-32,561 samples), ANML achieves 33-72% error reduction over gradient-only baselines. Quality-weighted training is data-efficient: 20% high-quality data outperforms 100% uniformly weighted data by 47%. A Two-Stage Adaptive gating mechanism guarantees that ANML never underperforms the best available baseline, including under strategic joint attacks combining credential faking with gradient alignment. When per-sample detection fails against subtle corruption, contributor-level attribution provides 1.3-5.3x greater improvement than sample-level methods, with the advantage growing as corruption becomes harder to detect.

ANML: Attribution-Native Machine Learning with Guaranteed Robustness

TL;DR

ANML introduces attribution-native machine learning by weighting per-sample training contributions with four provenance-informed factors: gradient-consistency , verification , reputation , and freshness . It analyzes multiplicative and robust signal-combination strategies, proposing Two-Stage Adaptive Gating and Softmax Blend to ensure baseline performance and resilience to data-poisoning. Empirical validation across five datasets shows substantial improvements (e.g., +33% to +75% error reduction) and notable data-efficiency (20% high-quality data beating 100% uniformly weighted data by 47%), along with strong robustness to adversarial conditions and meaningful contributor-level attribution gains. The work highlights practical implications for training frontier AI on expert data, offering mechanisms to credit contributors and sustain data-sharing incentives while maintaining safe training dynamics, including a safety fallback when external signals are compromised.

Abstract

Frontier AI systems increasingly train on specialized expert data, from clinical records to proprietary research to curated datasets, yet current training pipelines treat all samples identically. A Nobel laureate's contribution receives the same weight as an unverified submission. We introduce ANML (Attribution-Native Machine Learning), a framework that weights training samples by four quality factors: gradient-based consistency (q), verification status (v), contributor reputation (r), and temporal relevance (T). By combining what the model observes (gradient signals) with what the system knows about data provenance (external signals), ANML produces per-contributor quality weights that simultaneously improve model performance and enable downstream attribution. Across 5 datasets (178-32,561 samples), ANML achieves 33-72% error reduction over gradient-only baselines. Quality-weighted training is data-efficient: 20% high-quality data outperforms 100% uniformly weighted data by 47%. A Two-Stage Adaptive gating mechanism guarantees that ANML never underperforms the best available baseline, including under strategic joint attacks combining credential faking with gradient alignment. When per-sample detection fails against subtle corruption, contributor-level attribution provides 1.3-5.3x greater improvement than sample-level methods, with the advantage growing as corruption becomes harder to detect.
Paper Structure (68 sections, 1 theorem, 8 equations, 6 figures, 18 tables, 2 algorithms)

This paper contains 68 sections, 1 theorem, 8 equations, 6 figures, 18 tables, 2 algorithms.

Key Result

Proposition 1

Two-Stage Adaptive ANML provides two safety floors: (1) when all signals are high, it matches Uniform; (2) when signals are unreliable, it matches Krum.

Figures (6)

  • Figure 1: Combination method comparison. Multiplicative (red) degrades severely at low correlation. Both Adaptive Gating (green) and Softmax Blend (yellow) provide robust performance across realistic correlation levels, with similar overall results.
  • Figure 2: Left: Learning curves. ANML at 20% data beats Krum at 100% by 47%. Right: Attack resilience. ANML degrades $\sim$2$\times$ more gracefully than Krum as attack intensity increases.
  • Figure 3: ANML across dataset scales. Left: small datasets (178-569 samples). Right: large datasets (1.8K-32K samples). On Covertype, Krum (blue) is worse than Uniform (gray) - ANML (green) rescues.
  • Figure 4: ANML differentiates quality, not just fraud. (A) Selection rates by group. (B) Model error. (C) Expert/novice selection ratio showing quality preference.
  • Figure 5: Detectability sweep across 11 noise conditions. (A) Detectability decreases with more subtle corruption. (B) Contributor-level attribution improves consistently across all conditions; sample-level degrades as detection becomes harder. (C) Strong negative correlation (Pearson $r = -0.93$, $p < 0.001$): contributor-level advantage increases from 1.3$\times$ at high detectability to 5.3$\times$ when per-sample detection fails.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1: Two-Stage Safety Guarantees
  • proof