A Multi-Perspective Architecture for Semantic Code Search
Rajarshi Haldar, Lingfei Wu, Jinjun Xiong, Julia Hockenmaier
TL;DR
The paper tackles semantic code search by learning cross-modal representations for code and natural language, introducing a multi-perspective architecture (MP-CAT) that fuses global AST-informed encodings with local Bilateral Multi-Perspective Matching signals. CT and CAT establish baseline and AST-enhanced representations, while MP provides local matching signals; MP-CAT combines both to outperform prior approaches on CoNaLa. Results show that AST modeling boosts retrieval quality and that the multi-perspective fusion yields the best performance, validating the value of integrating structural code signals with cross-modal matching. The approach advances code retrieval by capturing richer global and local similarities, with potential applicability to other programming languages and practical reuse through publicly available code.
Abstract
The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories. In this paper, we propose a novel multi-perspective cross-lingual neural framework for code--text matching, inspired in part by a previous model for monolingual text-to-text matching, to capture both global and local similarities. Our experiments on the CoNaLa dataset show that our proposed model yields better performance on this cross-lingual text-to-code matching task than previous approaches that map code and text to a single joint embedding space.
