Implementation Design

This is a unordered list of implementation design decisions. Each topic tries to follow this structure:

Coverage Analysis Mechanism

Coverage information has to be collected at runtime. For this purpose JaCoCo creates instrumented versions of the original class definitions. The instrumentation process happens on-the-fly during class loading using so called Java agents.

There are several different approaches to collect coverage information. For each approach different implementation techniques are known. The following diagram gives an overview with the techniques used by JaCoCo highlighted:

Byte code instrumentation is very fast, can be implemented in pure Java and works with every Java VM. On-the-fly instrumentation with the Java agent hook can be added to the JVM without any modification of the target application.

The Java agent hook requires at least 1.5 JVMs. Class files compiled with debug information (line numbers) allow for source code highlighting. Unluckily some Java language constructs get compiled to byte code that produces unexpected highlighting results, especially in case of implicitly generated code like default constructors or control structures for finally statements.

Instrumentation Approach

Instrumentation means inserting probes at certain check points in the Java byte code. A probe is a generated piece of byte code that records the fact that it has been executed. JaCoCo inserts probes at the end of every basic block.

A basic block is a piece of byte code that has a single entry point (the first byte code instruction) and a single exit point (like jump, throw or return). A basic block must not contain jump targets except the entry point. One can think of basic blocks as the nodes in a control flow graph of a method. Using basic block boundaries to insert code coverage probes has been very successfully utilized by EMMA.

Basic block instrumentation works regardless of whether the class files have been compiled with debug information for source lines. Source code highlighting will of course not be possible without this debug information, but percentages on method level can still be calculated. Basic block probes result in reasonable overhead regarding class file size and performance. Partial line coverage can occur if a line contains more than one statement or a statement gets compiled into byte code forming more than one basic block (e.g. boolean assignments). Calculating basic block relies on the Java byte code only, therefore JaCoCo is independent of the source language and should also work with other Java VM based languages like Scala.

The huge drawback of this approach is the fact that basic blocks are actually much smaller in the Java VM: Nearly every byte code instruction (especially method invocations) can result in an exception. In this case the block is left somewhere in the middle without hitting the probe, which leads to unexpected results for example in case of negative tests. A possible solution would be to add exception handlers that trigger special probes.

Coverage Agent Isolation

The Java agent is loaded by the application class loader. Therefore the classes of the agent live in the same name space like the application classes which can result in clashes especially with the third party library ASM. The JoCoCo build therefore moves all agent classes into a unique package.

The JaCoCo build renames all classes contained in the jacocoagent.jar into classes with a org.jacoco.<randomid> prefix, including the required ASM library classes. The identifier is created from a random number. As the agent does not provide any API, no one should be affected by this renaming. This trick also allows that JaCoCo tests can be verified with JaCoCo.

Minimal Java Version

JaCoCo requires Java 1.5.

The Java agent mechanism used for on-the-fly instrumentation became available with Java 1.5 VMs. Coding and testing with Java 1.5 language level is more efficient, less error-prone – and more fun than with older versions. JaCoCo will still allow to run against Java code compiled for these.

Byte Code Manipulation

Instrumentation requires mechanisms to modify and generate Java byte code. JaCoCo uses the ASM library for this purpose internally.

Implementing the Java byte code specification would be an extensive and error-prone task. Therefore an existing library should be used. The ASM library is lightweight, easy to use and very efficient in terms of memory and CPU usage. It is actively maintained and includes as huge regression test suite. Its simplified BSD license is approved by the Eclipse Foundation for usage with EPL products.

Java Class Identity

Each class loaded at runtime needs a unique identity to associate coverage data with. JaCoCo creates such identities by a CRC64 hash code of the raw class definition.

In multi-classloader environments the plain name of a class does not unambiguously identify a class. For example OSGi allows to use different versions of the same class to be loaded within the same VM. In complex deployment scenarios the actual version of the test target might be different from current development version. A code coverage report should guarantee that the presented figures are extracted from a valid test target. A hash code of the class definitions allows to differentiate between classes and versions of classes. The CRC64 hash computation is simple and fast resulting in a small 64 bit identifier.

The same class definition might be loaded by class loaders which will result in different classes for the Java runtime system. For coverage analysis this distinction should be irrelevant. Class definitions might be altered by other instrumentation based technologies (e.g. AspectJ). In this case the hash code will change and identity gets lost. On the other hand code coverage analysis based on classes that have been somehow altered will produce unexpected results. The CRC64 code might produce so called collisions, i.e. creating the same hash code for two different classes. Although CRC64 is not cryptographically strong and collision examples can be easily computed, for regular class files the collision probability is very low.

Coverage Runtime Dependency

Instrumented code typically gets a dependency to a coverage runtime which is responsible for collecting and storing execution data. JaCoCo uses JRE types and interfaces only in generated instrumentation code.

Making a runtime library available to all instrumented classes can be a painful or impossible task in frameworks that use their own class loading mechanisms. Since Java 1.6 java.lang.instrument.Instrumentation has an API to extends the bootsstrap loader. As our minimum target is Java 1.5 JaCoCo decouples the instrumented classes and the coverage runtime through official JRE API types only. Different approaches have been implemented and tested so far:

The current JaCoCo Java agent implementation uses the ModifiedSystemClassRuntime adding APIs to the class java.sql.Types.

Memory Usage

Coverage analysis for huge projects with several thousand classes or hundred thousand lines of code should be possible. To allow this with reasonable memory usage the coverage analysis is based on streaming patterns and "depth first" traversals.

The complete data tree of a huge coverage report is too big to fit into a reasonable heap memory configuration. Therefore the coverage analysis and report generation is implemented as "depth first" traversals. Which means that at any point in time only the following data has to be held in working memory:

Java Element Identifiers

The Java language and the Java VM use different String representation formats for Java elements. For example while a type reference in Java reads like java.lang.Object, the VM references the same type as Ljava/lang/Object;. The JaCoCo API is based on VM identifiers only.

Using VM identifiers directly does not cause any transformation overhead at runtime. There are several programming languages based on the Java VM that might use different notations. Specific transformations should therefore only happen at the user interface level, for example during report generation.

Modularization of the JaCoCo implementation

JaCoCo is implemented in several modules providing different functionality. These modules are provided as OSGi bundles with proper manifest files. But there are no dependencies on OSGi itself.

Using OSGi bundles allows well defined dependencies at development time and at runtime in OSGi containers. As there are no dependencies on OSGi, the bundles can also be used like regular JAR files.