Improvement in Semantic Address Matching using Natural Language Processing
Vansh Gupta, Mohit Gupta, Jai Garg, Nitesh Garg
TL;DR
The paper tackles semantic address matching by integrating traditional BM25 retrieval with deep contextual representations from BERT to handle unstructured and incomplete addresses. It starts from OCR-extracted invoices to build a address dataset, uses BM25 to score candidate matches, and then refines results with BERT, supplemented by a threshold-driven fallback to component-wise string similarity. Experimental results indicate that the BM25+BERT combination achieves higher precision and recall than baselines, demonstrating practical improvements for real-world address coordination. The work highlights the value of combining traditional IR methods with NLP-based semantic representations and suggests future work to incorporate spatial cues for further gains.
Abstract
Address matching is an important task for many businesses especially delivery and take out companies which help them to take out a certain address from their data warehouse. Existing solution uses similarity of strings, and edit distance algorithms to find out the similar addresses from the address database, but these algorithms could not work effectively with redundant, unstructured, or incomplete address data. This paper discuss semantic Address matching technique, by which we can find out a particular address from a list of possible addresses. We have also reviewed existing practices and their shortcoming. Semantic address matching is an essentially NLP task in the field of deep learning. Through this technique We have the ability to triumph the drawbacks of existing methods like redundant or abbreviated data problems. The solution uses the OCR on invoices to extract the address and create the data pool of addresses. Then this data is fed to the algorithm BM-25 for scoring the best matching entries. Then to observe the best result, this will pass through BERT for giving the best possible result from the similar queries. Our investigation exhibits that our methodology enormously improves both accuracy and review of cutting-edge technology existing techniques.
