Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

Jan Bumberger

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

Jan Bumberger

TL;DR

The paper argues that semantics-first analysis, which maps measurements directly to predefined domain ontologies, limits reproducibility and comparability in image-based sciences due to ontology drift across time and communities. It proposes a criteria-first, semantics-later paradigm in which a fully specified, semantics-free structural product is extracted from measurements under explicit optimality criteria, with downstream semantic mappings attached later to domain ontologies. The authors present a unifying formal framework where structure is defined as S_C(X) under criterion C, enabling cross-domain transfer and long-term monitoring, and they outline the implications for validation, FAIR digital objects, and digital twins. This approach aims to enhance reproducibility, domain transfer, and open-ended discovery while accommodating evolving semantics through explicit, auditable mappings downstream.

Abstract

Across the natural and life sciences, images have become a primary measurement modality, yet the dominant analytic paradigm remains semantics-first. Structure is recovered by predicting or enforcing domain-specific labels. This paradigm fails systematically under the conditions that make image-based science most valuable, including open-ended scientific discovery, cross-sensor and cross-site comparability, and long-term monitoring in which domain ontologies and associated label sets drift culturally, institutionally, and ecologically. A deductive inversion is proposed in the form of criteria-first and semantics-later. A unified framework for criteria-first structure discovery is introduced. It separates criterion-defined, semantics-free structure extraction from downstream semantic mapping into domain ontologies or vocabularies and provides a domain-general scaffold for reproducible analysis across image-based sciences. Reproducible science requires that the first analytic layer perform criterion-driven, semantics-free structure discovery, yielding stable partitions, structural fields, or hierarchies defined by explicit optimality criteria rather than local domain ontologies. Semantics is not discarded; it is relocated downstream as an explicit mapping from the discovered structural product to a domain ontology or vocabulary, enabling plural interpretations and explicit crosswalks without rewriting upstream extraction. Grounded in cybernetics, observation-as-distinction, and information theory's separation of information from meaning, the argument is supported by cross-domain evidence showing that criteria-first components recur whenever labels do not scale. Finally, consequences are outlined for validation beyond class accuracy and for treating structural products as FAIR, AI-ready digital objects for long-term monitoring and digital twins.

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 2 figures)

This paper contains 13 sections, 5 equations, 2 figures.

Why semantics-first is now the limiting assumption
Approach: criteria-first, semantics-later
Key principle
From measurement to meaning
Unifying framework for criterion-defined structural discovery
Cross-Domain Evidence
Structural products as AI-ready FAIR digital objects for digital twins
Beyond class accuracy towards structural validation criteria
Reproducibility, domain transfer, and open-ended scientific discovery
AI-ready, FAIR-by-design, and digital-twin state variables
Outlook and research agenda
Supplement: Domain overviews
How to read the sketches

Figures (2)

Figure 1: The inversion. Top: a semantics-first pipeline in which a domain-specific label set determines model training (features $\rightarrow$ prediction) and yields outputs tied to a domain ontology -- typically brittle under domain shift. Bottom: a criteria-first pipeline in which explicit optimality criteria define a reproducible, semantics-free structural product that can be mapped downstream to multiple domain ontologies (and evolving label sets).
Figure 2: One image, two layers: stable structural product $S=S_C(X)$ versus brittle semantics-first labelling under shift. Columns show the original synthetic measurement field $X$ and three perturbations: global contrast change, covariate shift in appearance, and downsampling. Top row: $X$. Middle row (criteria-first): $S=S_C(X)$ under the same fixed criterion $C$, yielding comparable object instances/boundaries across perturbations (white outlines). Bottom row (semantics-first): labels are predicted directly from $X$ into a fixed label set (three colour-coded labels); assignments can collapse under covariate shift or disappear under downsampling (×). In a semantics-later framing, interpretation is a revisable mapping $M_i:S\rightarrow \mathcal{O}_i$, so ontology drift changes $M_i$ while $S$ can remain comparable and structurally validated.

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

TL;DR

Abstract

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

Authors

TL;DR

Abstract

Table of Contents

Figures (2)