Table of Contents
Fetching ...

Semantic-Inductive Attribute Selection for Zero-Shot Learning

Juan Jose Herrera-Aranda, Guillermo Gomez-Trenado, Francisco Herrera, Isaac Triguero

TL;DR

This work tackles the difficulty of inductive zero-shot learning (ZSL) arising from noisy and redundant semantic attributes. It introduces a class-stratified cross-validation partitioning to simulate unseen conditions using only seen data, enabling robust evaluation of attribute selection. Two complementary strategies are proposed: a rank-based embedded feature selection (RFS) with cross-validated consensus, and a genetic algorithm (GA) that globally searches attribute subsets with fitness tied to pseudo-unseen performance. Across five diverse benchmarks, both methods improve unseen accuracy over the baseline SAE, with RFS offering efficiency and GA providing broader space exploration at higher cost. The findings highlight semantic-space redundancy and demonstrate that systematic attribute refinement can enhance generalization in open-world AI tasks.

Abstract

Zero-Shot Learning is an important paradigm within General-Purpose Artificial Intelligence Systems, particularly in those that operate in open-world scenarios where systems must adapt to new tasks dynamically. Semantic spaces play a pivotal role as they bridge seen and unseen classes, but whether human-annotated or generated by a machine learning model, they often contain noisy, redundant, or irrelevant attributes that hinder performance. To address this, we introduce a partitioning scheme that simulates unseen conditions in an inductive setting (which is the most challenging), allowing attribute relevance to be assessed without access to semantic information from unseen classes. Within this framework, we study two complementary feature-selection strategies and assess their generalisation. The first adapts embedded feature selection to the particular demands of ZSL, turning model-driven rankings into meaningful semantic pruning; the second leverages evolutionary computation to directly explore the space of attribute subsets more broadly. Experiments on five benchmark datasets (AWA2, CUB, SUN, aPY, FLO) show that both methods consistently improve accuracy on unseen classes by reducing redundancy, but in complementary ways: RFS is efficient and competitive though dependent on critical hyperparameters, whereas GA is more costly yet explores the search space more broadly and avoids such dependence. These results confirm that semantic spaces are inherently redundant and highlight the proposed partitioning scheme as an effective tool to refine them under inductive conditions.

Semantic-Inductive Attribute Selection for Zero-Shot Learning

TL;DR

This work tackles the difficulty of inductive zero-shot learning (ZSL) arising from noisy and redundant semantic attributes. It introduces a class-stratified cross-validation partitioning to simulate unseen conditions using only seen data, enabling robust evaluation of attribute selection. Two complementary strategies are proposed: a rank-based embedded feature selection (RFS) with cross-validated consensus, and a genetic algorithm (GA) that globally searches attribute subsets with fitness tied to pseudo-unseen performance. Across five diverse benchmarks, both methods improve unseen accuracy over the baseline SAE, with RFS offering efficiency and GA providing broader space exploration at higher cost. The findings highlight semantic-space redundancy and demonstrate that systematic attribute refinement can enhance generalization in open-world AI tasks.

Abstract

Zero-Shot Learning is an important paradigm within General-Purpose Artificial Intelligence Systems, particularly in those that operate in open-world scenarios where systems must adapt to new tasks dynamically. Semantic spaces play a pivotal role as they bridge seen and unseen classes, but whether human-annotated or generated by a machine learning model, they often contain noisy, redundant, or irrelevant attributes that hinder performance. To address this, we introduce a partitioning scheme that simulates unseen conditions in an inductive setting (which is the most challenging), allowing attribute relevance to be assessed without access to semantic information from unseen classes. Within this framework, we study two complementary feature-selection strategies and assess their generalisation. The first adapts embedded feature selection to the particular demands of ZSL, turning model-driven rankings into meaningful semantic pruning; the second leverages evolutionary computation to directly explore the space of attribute subsets more broadly. Experiments on five benchmark datasets (AWA2, CUB, SUN, aPY, FLO) show that both methods consistently improve accuracy on unseen classes by reducing redundancy, but in complementary ways: RFS is efficient and competitive though dependent on critical hyperparameters, whereas GA is more costly yet explores the search space more broadly and avoids such dependence. These results confirm that semantic spaces are inherently redundant and highlight the proposed partitioning scheme as an effective tool to refine them under inductive conditions.

Paper Structure

This paper contains 25 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Workflow of an explicit semantic-inductive attribute selection. Preprocessing of the semantic space takes place without any information about the test (neither images nor their semantic information). Once completed, a ZSL method is applied as usual with the new set of chosen attributes.
  • Figure 2: Workflow of the proposed class-stratified cross-validation scheme. In each fold, the set of seen classes $\mathcal{Y}^s$ is split into two disjoint subsets: pseudo-seen and pseudo-unseen. Pseudo-unseen subsets are pairwise disjoint and together cover $\mathcal{Y}^s$ combined with balance constraints. This ensures that each class is evaluated as unseen exactly once.
  • Figure 3: Workflow of the RFS methodology (We use the same colour scheme as Fig. 2). In each fold of cross-validation, an embedded feature selection algorithm produces a ranking of attributes. A wrapper strategy then evaluates successive masks (top-$i$ attributes) with the base model, and the best-performing mask is retained for that fold. After all folds are processed, a consensus mechanism combines the results into a frequency vector, from which the final subset is obtained using stability thresholds $T_1$–$T_K$ (hyperparameter established a priori).
  • Figure 4: Relation between the number of selected attributes and unseen accuracy for different thresholds ($T_1$–$T_5$). Results are shown for embedded methods (RF, SVC, LR) compared to a random ranking on representative datasets. The plots illustrate how stricter thresholds reduce the attribute set while affecting accuracy differently depending on the ranking strategy.
  • Figure 5: Comparison of unseen accuracy across datasets for intermediate thresholds $T_3$ and $T_4$. Results are reported for the three embedded methods and a random approach, showing how different thresholds influence the balance between attribute reduction and performance.
  • ...and 4 more figures