Table of Contents
Fetching ...

Tile Compression and Embeddings for Multi-Label Classification in GeoLifeCLEF 2024

Anthony Miyaguchi, Patcharapong Aphiwetsa, Mark McDuffie

TL;DR

This work tackles multi-label plant species classification in GeoLifeCLEF 2024 by combining tile-based remote-sensing inputs compressed with 2D Discrete Cosine Transform and a neighborhood-aware, geolocation-driven approach. It evaluates Nearest Neighbor, Tile CNN, and Tile2Vec pipelines, leveraging ASL, BCE, Hill, and sigmoidF1 losses, with a self-supervised Tile2Vec objective built on triplet loss $L(t_a, t_n, t_d)=\left[||f_{\theta}(t_a)-f_{\theta}(t_n)||_2-||f_{\theta}(t_a)-f_{\theta}(t_d)||_2+m\right]_+$ and an ASL component to balance positives and negatives. The pipeline benefits from LSH-based approximate nearest neighbors (50 km cutoff), 128×128 tiles, and DCT-based feature reduction, achieving a public-leaderboard score of 0.161 with geolocation-fronted predictions, while post-competition top submissions reached around 0.409, illustrating substantial potential in geospatial cues and learned tile embeddings. Source code and models are publicly available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024, underscoring the approach’s reproducibility and practical relevance for scalable biodiversity monitoring.

Abstract

We explore methods to solve the multi-label classification task posed by the GeoLifeCLEF 2024 competition with the DS@GT team, which aims to predict the presence and absence of plant species at specific locations using spatial and temporal remote sensing data. Our approach uses frequency-domain coefficients via the Discrete Cosine Transform (DCT) to compress and pre-compute the raw input data for convolutional neural networks. We also investigate nearest neighborhood models via locality-sensitive hashing (LSH) for prediction and to aid in the self-supervised contrastive learning of embeddings through tile2vec. Our best competition model utilized geolocation features with a leaderboard score of 0.152 and a best post-competition score of 0.161. Source code and models are available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024.

Tile Compression and Embeddings for Multi-Label Classification in GeoLifeCLEF 2024

TL;DR

This work tackles multi-label plant species classification in GeoLifeCLEF 2024 by combining tile-based remote-sensing inputs compressed with 2D Discrete Cosine Transform and a neighborhood-aware, geolocation-driven approach. It evaluates Nearest Neighbor, Tile CNN, and Tile2Vec pipelines, leveraging ASL, BCE, Hill, and sigmoidF1 losses, with a self-supervised Tile2Vec objective built on triplet loss and an ASL component to balance positives and negatives. The pipeline benefits from LSH-based approximate nearest neighbors (50 km cutoff), 128×128 tiles, and DCT-based feature reduction, achieving a public-leaderboard score of 0.161 with geolocation-fronted predictions, while post-competition top submissions reached around 0.409, illustrating substantial potential in geospatial cues and learned tile embeddings. Source code and models are publicly available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024, underscoring the approach’s reproducibility and practical relevance for scalable biodiversity monitoring.

Abstract

We explore methods to solve the multi-label classification task posed by the GeoLifeCLEF 2024 competition with the DS@GT team, which aims to predict the presence and absence of plant species at specific locations using spatial and temporal remote sensing data. Our approach uses frequency-domain coefficients via the Discrete Cosine Transform (DCT) to compress and pre-compute the raw input data for convolutional neural networks. We also investigate nearest neighborhood models via locality-sensitive hashing (LSH) for prediction and to aid in the self-supervised contrastive learning of embeddings through tile2vec. Our best competition model utilized geolocation features with a leaderboard score of 0.152 and a best post-competition score of 0.161. Source code and models are available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024.
Paper Structure (32 sections, 8 equations, 7 figures, 4 tables)

This paper contains 32 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: GeoJSON polygon definition.
  • Figure 2: Polygon region overlayed on a map.
  • Figure 3: Overview of the data and modeling pipeline. Raw data is pre-processed to maintain survey site per row semantics, with 2D DCT coefficients as features. Data is cached as columnar parquet files in cloud storage for efficient access.
  • Figure 4: Example of a tiled raster image. The image is a 128x128 tile of the RGB-NIR satellite imagery. The image is associated with a survey site and is used as input to the model.
  • Figure 5: Example of low-pass filtering using the DCT. (a) Original bio1 raster image. (b) Low-pass filter using the first 50 coefficients of the 1D-DCT, reshaping on the first axis (row-major order). (c) Low-pass filter using the 2D-DCT using the top-left 8x8 coefficients.
  • ...and 2 more figures