TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation
Mayank Kumar, Jiaqi Xue, Mengxin Zheng, Qian Lou
TL;DR
This work addresses the barrier of generating correct TFHE code from natural language by introducing a compiler-in-the-loop framework that iteratively refines LLM outputs using compiler feedback. It compares baseline and agentic workflows, notably incorporating Retrieval-Augmented Generation (RAG) and few-shot prompting to improve API usage and structural fidelity for gate-level TFHE operations and ReLU activation. The study finds that GPT-4o consistently outperforms open-source models, while few-shot prompting substantially enhances correctness, and combining RAG with few-shot prompting yields the strongest results for capable models; ReLU remains the most challenging task. Overall, the paper provides the first benchmark for TFHE-code generation and demonstrates that domain-specific feedback can bridge much of the expertise gap in secure computation code synthesis, with implications for broader adoption of privacy-preserving technologies.
Abstract
Fully Homomorphic Encryption over the torus (TFHE) enables computation on encrypted data without decryption, making it a cornerstone of secure and confidential computing. Despite its potential in privacy preserving machine learning, secure multi party computation, private blockchain transactions, and secure medical diagnostics, its adoption remains limited due to cryptographic complexity and usability challenges. While various TFHE libraries and compilers exist, practical code generation remains a hurdle. We propose a compiler integrated framework to evaluate LLM inference and agentic optimization for TFHE code generation, focusing on logic gates and ReLU activation. Our methodology assesses error rates, compilability, and structural similarity across open and closedsource LLMs. Results highlight significant limitations in off-the-shelf models, while agentic optimizations such as retrieval augmented generation (RAG) and few-shot prompting reduce errors and enhance code fidelity. This work establishes the first benchmark for TFHE code generation, demonstrating how LLMs, when augmented with domain-specific feedback, can bridge the expertise gap in FHE code generation.
