Classifying geospatial objects from multiview aerial imagery using semantic meshes
David Russell, Ben Weinstein, David Wettergreen, Derek Young
TL;DR
This paper tackles the limitations of orthomosaic-based analyses in geospatial mapping by introducing a multiview semantic mesh approach that operates on raw drone imagery from multiple viewpoints. It presents a complete workflow for building 3D semantic meshes, training with per-face labels, and projecting predictions back to geospatial coordinates, along with an open-source toolkit. A field-validated four-site forest dataset is released to support multiview tree species classification, and cross-site experiments show that low-oblique multiview imagery (MV-LO) yields the best performance, achieving 75% accuracy versus 53% for the orthomosaic baseline. The findings demonstrate the practical value of multiview information and oblique perspectives for robust, scalable geospatial prediction in forest ecosystems, with implications for ecological mapping and resource management.
Abstract
Aerial imagery is increasingly used in Earth science and natural resource management as a complement to labor-intensive ground-based surveys. Aerial systems can collect overlapping images that provide multiple views of each location from different perspectives. However, most prediction approaches (e.g. for tree species classification) use a single, synthesized top-down "orthomosaic" image as input that contains little to no information about the vertical aspects of objects and may include processing artifacts. We propose an alternate approach that generates predictions directly on the raw images and accurately maps these predictions into geospatial coordinates using semantic meshes. This method$\unicode{x2013}$released as a user-friendly open-source toolkit$\unicode{x2013}$enables analysts to use the highest quality data for predictions, capture information about the sides of objects, and leverage multiple viewpoints of each location for added robustness. We demonstrate the value of this approach on a new benchmark dataset of four forest sites in the western U.S. that consists of drone images, photogrammetry results, predicted tree locations, and species classification data derived from manual surveys. We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthomosaic baseline on a challenging cross-site tree species classification task.
