Table of Contents
Fetching ...

Benchmarking Harmonized Tariff Schedule Classification Models

Bryce Judy

TL;DR

Addresses the lack of standardized HTS benchmarking by proposing a framework inspired by language-model benchmarks to evaluate speed, accuracy, rationality, and code alignment across HTS tools. Uses a CBP rulings-based dataset of 103 classifications derived from 100 rulings, with item name and description as inputs to four tools: Zonos, Tarifflo, Avalara, and WCO BACUDA. Finds Tarifflo yields the highest full 10-digit HTS accuracy with transparent rationales, while Zonos and WCO BACUDA offer speed with limited verification, and Avalara delivers high accuracy but at slow, manual cost. The work demonstrates actionable trade-offs and argues for standardized benchmarks to enable fair comparisons and drive improvements in HTS classification for e-commerce and international trade.

Abstract

The Harmonized Tariff System (HTS) classification industry, essential to e-commerce and international trade, currently lacks standardized benchmarks for evaluating the effectiveness of classification solutions. This study establishes and tests a benchmark framework for imports to the United States, inspired by the benchmarking approaches used in language model evaluation, to systematically compare prominent HTS classification tools. The framework assesses key metrics--such as speed, accuracy, rationality, and HTS code alignment--to provide a comprehensive performance comparison. The study evaluates several industry-leading solutions, including those provided by Zonos, Tarifflo, Avalara, and WCO BACUDA, identifying each tool's strengths and limitations. Results highlight areas for industry-wide improvement and innovation, paving the way for more effective and standardized HTS classification solutions across the international trade and e-commerce sectors.

Benchmarking Harmonized Tariff Schedule Classification Models

TL;DR

Addresses the lack of standardized HTS benchmarking by proposing a framework inspired by language-model benchmarks to evaluate speed, accuracy, rationality, and code alignment across HTS tools. Uses a CBP rulings-based dataset of 103 classifications derived from 100 rulings, with item name and description as inputs to four tools: Zonos, Tarifflo, Avalara, and WCO BACUDA. Finds Tarifflo yields the highest full 10-digit HTS accuracy with transparent rationales, while Zonos and WCO BACUDA offer speed with limited verification, and Avalara delivers high accuracy but at slow, manual cost. The work demonstrates actionable trade-offs and argues for standardized benchmarks to enable fair comparisons and drive improvements in HTS classification for e-commerce and international trade.

Abstract

The Harmonized Tariff System (HTS) classification industry, essential to e-commerce and international trade, currently lacks standardized benchmarks for evaluating the effectiveness of classification solutions. This study establishes and tests a benchmark framework for imports to the United States, inspired by the benchmarking approaches used in language model evaluation, to systematically compare prominent HTS classification tools. The framework assesses key metrics--such as speed, accuracy, rationality, and HTS code alignment--to provide a comprehensive performance comparison. The study evaluates several industry-leading solutions, including those provided by Zonos, Tarifflo, Avalara, and WCO BACUDA, identifying each tool's strengths and limitations. Results highlight areas for industry-wide improvement and innovation, paving the way for more effective and standardized HTS classification solutions across the international trade and e-commerce sectors.

Paper Structure

This paper contains 10 sections, 3 tables.