Table of Contents
Fetching ...

Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles

Tanmay Dogra, Eric Ngo, Mohammad Alam, Jean-Paul Talavera, Asim Dahal

TL;DR

The paper addresses automated grading of high-value Saint-Gaudens Double Eagles under limited labeled data and inter-grader variability. It compares a handcrafted feature pipeline feeding an ANN and an SVM against a hybrid CNN that fuses EfficientNetV2-derived features with engineered features, evaluated on 1,785 coins with severe class imbalance. The feature-engineered ANN achieves 86% exact and 98% within ±3 grades, outperforming both the CNN and SVM, which collapse toward the majority class; the results are attributed to domain knowledge, data scarcity, and imbalance management via SMOTE. The findings challenge the assumption that deep learning universally outperforms hand-crafted features in niche quality-assessment tasks and offer practical deployment insights and guidelines for similar applications in manufacturing, medical imaging, and cultural heritage contexts.

Abstract

We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feature-based Artificial Neural Network built around 192 custom features pulled from Sobel edge detection and HSV color analysis up against a hybrid Convolutional Neural Network that blends in EfficientNetV2, plus a straightforward Support Vector Machine as the control. Testing 1,785 coins graded by experts, the ANN nailed 86% exact matches and hit 98% when allowing a 3-grade leeway. On the flip side, CNN and SVM mostly just guessed the most common grade, scraping by with 31% and 30% exact hits. Sure, the CNN looked good on broader tolerance metrics, but that is because of some averaging trick in regression that hides how it totally flops at picking out specific grades. All told, when you are stuck with under 2,000 examples and lopsided classes, baking in real coin-expert knowledge through feature design beats out those inscrutable, all-in-one deep learning setups. This rings true for other niche quality checks where data's thin and know-how matters more than raw compute.

Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles

TL;DR

The paper addresses automated grading of high-value Saint-Gaudens Double Eagles under limited labeled data and inter-grader variability. It compares a handcrafted feature pipeline feeding an ANN and an SVM against a hybrid CNN that fuses EfficientNetV2-derived features with engineered features, evaluated on 1,785 coins with severe class imbalance. The feature-engineered ANN achieves 86% exact and 98% within ±3 grades, outperforming both the CNN and SVM, which collapse toward the majority class; the results are attributed to domain knowledge, data scarcity, and imbalance management via SMOTE. The findings challenge the assumption that deep learning universally outperforms hand-crafted features in niche quality-assessment tasks and offer practical deployment insights and guidelines for similar applications in manufacturing, medical imaging, and cultural heritage contexts.

Abstract

We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feature-based Artificial Neural Network built around 192 custom features pulled from Sobel edge detection and HSV color analysis up against a hybrid Convolutional Neural Network that blends in EfficientNetV2, plus a straightforward Support Vector Machine as the control. Testing 1,785 coins graded by experts, the ANN nailed 86% exact matches and hit 98% when allowing a 3-grade leeway. On the flip side, CNN and SVM mostly just guessed the most common grade, scraping by with 31% and 30% exact hits. Sure, the CNN looked good on broader tolerance metrics, but that is because of some averaging trick in regression that hides how it totally flops at picking out specific grades. All told, when you are stuck with under 2,000 examples and lopsided classes, baking in real coin-expert knowledge through feature design beats out those inscrutable, all-in-one deep learning setups. This rings true for other niche quality checks where data's thin and know-how matters more than raw compute.

Paper Structure

This paper contains 33 sections, 4 equations, 2 tables.