Table of Contents
Fetching ...

Compilation of Commit Changes within Java Source Code Repositories

Stefan Schott, Wolfram Fischer, Serena Elisa Ponta, Jonas Klauke, Eric Bodden

TL;DR

The paper presents JESS, a method to compile only the code changed in a commit by slicing the surrounding project, inferring missing types, and generating stubs for unresolved references, with the aim of producing bytecode identical or near-identical to full project builds. Through mark-and-sweep slicing and robust stub generation, Jess enables targeted compilation of vulnerability-fix commits, improving identification of vulnerable code in bytecode dependencies. Large-scale evaluation on 347 GitHub Java projects shows high compilation rates and bytecode similarity, while experiments on Project KB demonstrate Jess can compile a majority of files modified in fix-commits, far surpassing build-script-only approaches. The results indicate substantial practical potential for security analyses in Java ecosystems, enabling precise mapping of fixes to bytecode without requiring full project recompilation.

Abstract

Java applications include third-party dependencies as bytecode. To keep these applications secure, researchers have proposed tools to re-identify dependencies that contain known vulnerabilities. Yet, to allow such re-identification, one must obtain, for each vulnerability patch, the bytecode fixing the respective vulnerability at first. Such patches for dependencies are curated in databases in the form of fix-commits. But fixcommits are in source code, and automatically compiling whole Java projects to bytecode is notoriously hard, particularly for non-current versions of the code. In this paper, we thus propose JESS, an approach that largely avoids this problem by compiling solely the relevant code that was modified within a given commit. JESS reduces the code, retaining only those parts that the committed change references. To avoid name-resolution errors, JESS automatically infers stubs for references to entities that are unavailable to the compiler. A challenge is here that, to facilitate the above mentioned reidentification, JESS must seek to produce bytecode that is almost identical to the bytecode which one would obtain by a successful compilation of the full project. An evaluation on 347 GitHub projects shows that JESS is able to compile, in isolation, 72% of methods and constructors, of which 89% have bytecode equal to the original one. Furthermore, on the Project KB database of fix-commits, in which only 8% of files modified within the commits can be compiled with the provided build scripts, JESS is able to compile 73% of all files that these commits modify.

Compilation of Commit Changes within Java Source Code Repositories

TL;DR

The paper presents JESS, a method to compile only the code changed in a commit by slicing the surrounding project, inferring missing types, and generating stubs for unresolved references, with the aim of producing bytecode identical or near-identical to full project builds. Through mark-and-sweep slicing and robust stub generation, Jess enables targeted compilation of vulnerability-fix commits, improving identification of vulnerable code in bytecode dependencies. Large-scale evaluation on 347 GitHub Java projects shows high compilation rates and bytecode similarity, while experiments on Project KB demonstrate Jess can compile a majority of files modified in fix-commits, far surpassing build-script-only approaches. The results indicate substantial practical potential for security analyses in Java ecosystems, enabling precise mapping of fixes to bytecode without requiring full project recompilation.

Abstract

Java applications include third-party dependencies as bytecode. To keep these applications secure, researchers have proposed tools to re-identify dependencies that contain known vulnerabilities. Yet, to allow such re-identification, one must obtain, for each vulnerability patch, the bytecode fixing the respective vulnerability at first. Such patches for dependencies are curated in databases in the form of fix-commits. But fixcommits are in source code, and automatically compiling whole Java projects to bytecode is notoriously hard, particularly for non-current versions of the code. In this paper, we thus propose JESS, an approach that largely avoids this problem by compiling solely the relevant code that was modified within a given commit. JESS reduces the code, retaining only those parts that the committed change references. To avoid name-resolution errors, JESS automatically infers stubs for references to entities that are unavailable to the compiler. A challenge is here that, to facilitate the above mentioned reidentification, JESS must seek to produce bytecode that is almost identical to the bytecode which one would obtain by a successful compilation of the full project. An evaluation on 347 GitHub projects shows that JESS is able to compile, in isolation, 72% of methods and constructors, of which 89% have bytecode equal to the original one. Furthermore, on the Project KB database of fix-commits, in which only 8% of files modified within the commits can be compiled with the provided build scripts, JESS is able to compile 73% of all files that these commits modify.
Paper Structure (13 sections, 8 figures, 4 tables)