Layers of abstraction are great at hiding complexity, but they also hide side-channel vulnerabilities. By intentionally breaking such layers of abstractions, the Raccoon project demonstrates an approach to closing a broad class of side channels. Compared to related work, our solution can protect a more diverse set of applications running on modern microarchitectures.

A Primer on Side Channels

Imagine a bank that stores its customer’s account balances in an encrypted database. Every night, the bank executes a program that computes the compound interest on its clients’ account balances, and the program rewards clients with a large balance. Let’s say that their program looks like the following.

if balance > 1_000_000 {
  PING   // Placeholder.
} else {
  PONG
}

The PING and PONG statements are placeholders for actual computation. Now, if the attacker can observe whether the program resulted in PING instead of PONG, then she will be able to count the number of customers with a large account balance, thus estimating the bank’s net worth. Perhaps combined with auxiliary information inferred from other parts of the program or from offline sources, she may even be able to narrow down the list of specific people that own the accounts with large balances.

In this example, the PING and PONG observations act as side channels that leak information about the account balance. In practice, the PING and PONG observations could be digital side channels — those that leak discrete bits of information through entities such as instruction count, cache hits and misses, and branch predictor state — or they can be non-digital side channels, such as electrical power consumption, heat, and sound. The end result is that simply encrypting data at rest is insufficient to hide it from attackers, since various side channels can inadvertently leak information.

Side channels aren’t just an academic attack. They have been executed against deployed systems since the 1950s (see the U.S. Government’s TEMPEST program and Russian efforts to implant bugs in the IBM Selectric typewriters). As recently as 2017, vulnerabilities such as Spectre and Meltdown and others have shown the ability to steal sensitive information from programs executing concurrently on the same machine.

As we move toward a future where cryptographic implementations become increasingly resilient to attacks, attackers may turn to side channels to steal sensitive information.

Why Side Channels Are Difficult to Close

Side channels are caused by variations in program execution, and such variations are deeply ingrained in the nature of code execution. For instance, algorithms perform different actions based on their inputs, optimizations in compilers and microarchitectures may be applicable only in certain cases, and abnormal inputs cause exceptions, which may require special handling. Each variation enables a new trail of “breadcrumbs” that an attacker can observe to infer sensitive information.

Furthermore, modern systems (including compilers, runtime systems, operating systems, Instruction Set Architectures (ISAs), microarchitectures, and physical hardware) are complex, which makes it difficult to enumerate all sources of variations, let alone eliminate the variations. Modern systems also contain several abstractions, which, although useful for hiding complexity, hide the existence of potential variations, to the extent that adding abstractions has become a liability.

Because of the wide-spread nature of side channels, they impact the security of not just cryptographic implementations, but several other important applications including databases, machine-learning algorithms, browsers, and parsers, all of which may operate on sensitive information. At the same time, techniques that we have come to rely on, such as encryption, isolation (using Virtual Machines, enclaves, or containers), or access control (authentication and authorization) do not hide execution-time variations, thus being ineffective at eliminating side channels.

The Raccoon Approach

The Raccoon approach uses compilers to close a broad class of side channels for a diverse set of applications, so that the code transformed by the compiler produces the same result as the original program, but the transformed program does not leak information through specific side channels. The Raccoon approach uses ideas from program analysis, computer architecture, and, of course, computer security.

The broad philosophy behind the Raccoon family of compilers (Raccoon, Escort, and Vantage — currently under submission) is to discover and eliminate sources of variations. Since variations can occur at different levels (for example, source code, compiler backend, ISA, microarchitecture, etc.), we develop our compilers by analyzing the underlying layers (either using source-code analysis, statistical techniques, or using manual inspection), so that the compilers are made aware of potential side channels. The compilers then rewrite the program to replace vulnerable instructions with alternate instruction sequences that are functionally equivalent to the original instruction, but which do not leak information.

Compilers, especially ahead-of-time compilers, are well-positioned to tackle this problem since (1) compilers, unlike the microarchitecture or the physical hardware, can analyze the long-term program behavior without being constrained to small physical memory and (2) ahead-of-time compilers have fewer constraints on the analysis time compared to the microarchitecture or just-in-time compilers. Being housed in a retargetable compiler also enables (1) tailored defenses for different microarchitectures and hardware implementations and (2) tailored defenses for different threat models, which dictate the side channels that the program should defend from.

But pure compiler-based solutions also have drawbacks. Most importantly, abstractions such as the ISA, which only present a functional interface, prevent the compiler from reasoning about variations (and thus potential side channels) in the underlying layers such as the microarchitecture. To overcome this limitation, the Raccoon approach effectively passes (static) information from the lower levels of abstractions (physical hardware, microarchitecture, ISA, and the compiler backend) to the compiler transformations, which accordingly analyze the program, discover potential information leakage, and rewrite the instruction stream.

architecture
Figure: High-level organization of the Raccoon compilers. Yellow (upward) arrows indicate the flow of information, either through program analysis, statistical techniques, or manual inspection. Green (downward) arrow indicates control. In the future, we envision broadening the ISA to allow the compiler to exert greater control over the microarchitecture and the physical hardware.

Effectively, the information passed from the underlying levels to the compiler is a Leakage Model, which, for every side channel, records the instructions that may leak information and the associated information that is leaked. For instance, our timing leakage model for an Intel x64 Nehalem microarchitecture includes information about the division instruction leaking information about the numerator due to varying number of executed micro-operations. Similarly, our power leakage model for a 32-bit ARM processor includes information about predicated operations affecting the program’s energy consumption due to conditional accesses of the register file. By augmenting the compiler with such leakage models, the Raccoon approach enables the compiler to analyze and translate programs to new abstract domains, which are representations that measure different properties of the program execution. In effect, the new abstract domains enable the compiler to reason about the impact of the program’s execution on cache hits and misses, branch predictor state, electrical power consumption, etc.

InstructionMicroarchitectureThreat ModelLeakage
idiv (reg)Gem5 x64timingnumerator
idiv (reg)Gem5 x64terminationdenominator
idiv (mem)Gem5 x64cacheaddr(denominator) >> 6
__aeabi_idiv (ABI)Gem5 ARM v7cacheuinterp(numerator) uinterp(denominator)
Table: Example leakage model for different side channels, derived using a combination of program analysis, statistical techniques, and manual inspection. uinterp stands for uninterpreted function.

Indeed, the compiler’s ability to exert control is limited to the actions allowed by the ISA. As we describe in the Limitations section, this restriction introduces inefficiencies in closing all side channels using the Raccoon approach. However, we plan to overcome these limitations by broadening the definition of the ISA, so that it permits not just functional requests but behavioral requests as well.

Finally, compared to solutions that treat each side channel individually, the Raccoon approach makes it simpler to defend from different side channels because of the observation that different variations (and thus, different side channels) stem from differences in the runtime data and code structure. Thus, the code transformations in the Raccoon approach enable us to close a broad class of side channels using a single solution. It also enables us to reason about the composition of defenses for disparate side channels, since the impact of the individual code transformations is known to the orchestrating compiler.

Analyses and Transformations Implemented In Raccoon Compilers

Broadly, in designing the Raccoon compilers, we analyzed the LLVM target code generator, the ARM Runtime ABI library implementation, the x64, RISC-V, and ARM v8 ISAs, the microarchitectural implementations of the x64, ARM v7, and ARM v8 ISAs in the Gem5 simulator, and microarchitectural power models such as McPAT. Some of these analyses are automated (specifically, we have built tools to analyze the LLVM code generator and ISAs written in the SAIL ISA description language), while other components (the ARM ABI implementation, Gem5’s microarchitecture implementations, and the McPAT power model) are manually analyzed, but the manual analysis can be automated using better tools. Using these analyses, we have built (coarse) leakage models for different side channels including timing, cache, branch prediction, TLB, DRAM address trace, and instruction-level power consumption.

The Raccoon compilers then apply inter-procedural, flow-sensitive, and context-insensitive, implicit- and explicit-flow taint tracking to identify the instructions that may leak information. The compilers limit the subsequent transformations to only the tainted instructions.

The key idea behind the code transformations in the Raccoon compilers is to make the sensitive information independent of the leakage shown by the leakage model. Thus, when the leakage model indicates that conditional branch instructions may leak information through the instruction pointer, the code transformation removes conditional branches, while using predication to control side effects. Similarly, when the leakage model indicates that addresses in load and store instructions may leak information through cache hits and misses, the code transformation makes the load/store address independent of the secrets (illustrated below).

Currently, code transformations in the Raccoon compilers have been manually designed based on the derived leakage models, but work is underway to automate the generation of these transformations. Since the precise collection of chosen transformations depends on the combination of the microarchitecture and the threat model, listing every possible transformation is exhausting and somewhat redundant. Here, we briefly illustrate three broad classes of transformations that may be combined in various ways by the Raccoon compilers.

Eliminating Variations Due to Control Flow

The Raccoon compilers transform branches whose predicate depends on a sensitive value, through a process known as if-conversion, where the code is rewritten so that the branch operation is converted into predicated operations. The following example illustrates the transformation.

Before Transformation:

if balance > 1_000_000 {
  PING   // Placeholder.
} else {
  PONG
}

After Transformation:

balance > 1_000_000:  PING
balance <= 1_000_000: PONG
Figure: Removing control flow variations using predicates. The example on the top is transformed into the example on the bottom, where the predicate controls the side effects of the Ping or Pong statements. Predication is implemented using native assembly instructions.

The transformed example above uses the branch predicate (either account balance larger than $1,000,000 or account balance at most $1,000,000) to control the side effects of the computation. Indeed, statements can produce different kinds of side effects. The Raccoon compilers can hide side effects arising from register writes, memory accesses, exceptions, and function calls, but not I/O side effects (hiding I/O side effects could be possible with the cooperation of the Operating System).

Our compilers compute predicates for arbitrarily-nested (but structured) control flow by propagating branch conditions along the topological order of the control flow graph.

Transforming Loops. Loops may affect control flow as well. When the loop iteration count leaks sensitive information, there is no known perfect solution to guarantee both correctness and security in all cases. In such cases, the Raccoon compilers unroll the loop by a fixed amount (as specified using programmer annotations), before predicating the loop body, so that if the loop is unrolled more times than necessary, then side effects produced by extra iterations will be hidden, thus ensuring correctness. Indeed, if the loop iterations are underspecified through programmer annotations, then the transformed program will produce incorrect results. The following sequence of figures illustrates the transformation.

Loop Transformation
Figure: Loop transformations in the Raccoon compilers using predication. Tap or click image to see different stages of the transformation.

Transforming Function Calls. Our compilers instrument the program so that the predicate of the caller function is passed to the callee function. The callee function computes the logical AND of the incoming predicate and the callee’s locally-computed predicates, before using the resulting predicate for hiding side effects of dummy statements.

Eliminating Variations in Memory Accesses

To hide sensitive pointer dereferences, the Raccoon compilers offer two solutions: (1) a software implementation of the Path ORAM algorithm and (2) simply streaming through the entire array (using non-temporal vector memory access instructions to improve speed). The software implementation of Path ORAM differs from the original Path ORAM algorithm because the x64 and ARM ISAs do not offer an on-chip memory for the Stash and the Position Map.

Eliminating Variations Caused By Individual Instructions

Various architecture-specific instructions may create side channels; we illustrate our defense for one such instruction: the integer division instruction running on Gem5’s implementation of the x64 ISA. This instruction consumes variable latency depending on the value of its numerator. Our compiler hence replaces this instruction with a software implementation of integer division (shown below), which uses only those instructions that are known to consume a fixed number of micro-operations.

When our compiler is invoked, depending on the requested threat model and the microarchitecture, the compiler chooses the exact set of transformations to apply.

// return unsigned greater than or equal to comparison.
uint8_t cmp_uge_32(uint32_t x, uint32_t y);

// If `condition' is true, return `t_val' else `f_val'.
uint32_t cmov_32(uint8_t condition, uint32_t t_val, uint32_t f_val);


static void u32_div_rem(uint32_t numerator, uint32_t denominator,
uint32_t* quotient, uint32_t* remainder) {
  int32_t i;
  uint32_t __quo = 0, __rem = 0;

  for (i = sizeof(uint32_t) * 8 - 1; i >= 0; i--) {
    __rem <<= 1;

    uint8_t num_i = (numerator >> i) & 1;
    __rem |= num_i;

    uint8_t predicate = cmp_uge_32(__rem, denominator);
    __rem = cmov_32(predicate, __rem - denominator, __rem);

    uint8_t q_bit = cmov_32(predicate, 1, (__quo >> i) & 1);
    __quo |= (q_bit << i);
  }

  *quotient = __quo;
  *remainder = __rem;
}
Figure: Software implementation of unsigned integer division, which is used in place of the native division operation on relevant microarchitectures.

How Raccoon is Different

Compared to other research on side-channel defenses (too many to cite), the Raccoon approach differs in two main ways.

Closing a Broader Class of Side Channels

The vast majority of existing side-channel defenses are point solutions — they close an isolated few side channels. The key problem with point solutions is that individual solutions may not compose well with each other to provide a comprehensive defense. For instance, Memory Trace Obliviousness (a property exhibited by the GhostRider solution) guarantees the address-trace side-channel security when the stream of memory accesses is deterministic, but cache defenses that randomly replace cache lines break the expectation of deterministic memory addresses. In contrast, the Raccoon approach presents a solution that can be aware of the impact of code transformations for individual defenses, thus being able to reason about the composition of individual defenses.

Protecting a Broader Class of Applications

Several existing side channel defenses protect highly-restrictive programs, such as cryptographic kernels like AES implementations. Such programs are dramatically different from commonly-used sensitive applications such as machine-learning algorithms and graph kernels, due to wide discrepancies in their execution behavior. The Raccoon approach is able to protect such applications due to the compiler’s ability to analyze and transform the long-term program behavior. While there do exist applications that cannot be transformed using the Raccoon compilers (details in the Limitations section), compared to related research, our compilers are able to substantially increase the set of applications that can be protected.

Performance Cost

The Raccoon compilers can target different microarchitectures and threat models. Here, we illustrate the performance overhead of our Vantage compiler on Gem5’s implementation of the ARM v7 architecture for three applications of different sizes. Briefly, the experimental setup is comprised of a 1 GHz, out of order ARM v7 processor with 32 KB L1 cache, 256 KB L2 cache, and an 8 MB L3 cache. We use three applications, each of which possesses certain key characteristics.

Find Max: Iterates over a linear list to find the maximum element. Contains control flow whose predicate depends on sensitive information.

Top-K Search: Builds a heap from a linear list, then iterates ‘k’ times to remove the minimum element and recompute the heap. Dereferences pointers whose value is based on sensitive information.

K-Means Clustering: Organizes elements into ‘k’ clusters. Contains floating-point arithmetic operations, secret pointer dereferences, and secret control flow.

BenchmarkTiming Defense OverheadInstruction-Level Power Defense OverheadDigital Channel Defense Overhead
Find Max1.5×2.1×1.5×
Top-K4.1×6.5×20.0×
K-Means1.2×2.9×1.3×
Table: Performance overhead of one of the Raccoon compilers for three applications protected against different side channels.

It is perhaps evident that the performance overhead varies significantly across threat models and applications. Most importantly, seemingly small changes in threat models can cause dramatic differences in the performance overhead

It may appear as though the performance overhead is too high compared to related research, but when comparing, it is important to note the difference, if any, between threat models. We often find the performance overheads of our solutions being compared with other solutions that protect against a weaker threat model or solutions that protect only small cryptographic applications.

Lastly, for many attacks (including non-timing channel attacks), timing is implicit in the attacker’s measurements. In other words, the attacker often has the ability to associate a timestamp (or a time window) with each measurement. Consequently, if a side-channel defense does not ensure that the time at which all observable events occur is independent of sensitive information, then that defense is insufficient. [Side note: It may seem like defenses that perturb the timing of observable events by adding noise or through predictive mitigation are good trade-offs between security and performance. However, such techniques are insufficient (demonstrating this is a topic of a future article).] As a result, the performance overhead of the Raccoon compilers is a direct result of the nature of the problem for applications whose behavior varies widely.

Does this mean that side-channel defenses will always have a large performance impact? No, it is possible to ensure fixed execution time that is lower than the current execution time, while also ensuring that the execution time does not leak information. Doing so requires work on both the compiler (e.g. Constant-Time Floating-Point Operations) as well as the microarchitecture (e.g. Stash Memory): aggressive static analyses can prune the instructions that need to be transformed and modest microarchitectural optimizations can offer primitives that improve speed.

Limitations of the Raccoon Approach

The Raccoon approach does not completely solve the problem of side channels. In particular, the Raccoon approach faces three main limitations listed below. These limitations are artifactual — that is, not fundamental — and we touch on ways to remove these limitations.

Closing Side Channels at the Lowest Levels of the Computing Stack

As we mentioned earlier, side channels that are at the lowest levels of the computing stack are difficult to close using a compiler-based approach, due to both the undecidability of static analyses and also due to the limited interface exposed by the ISA. For instance, a multiplier may leak information about its operands through the power consumption, and the ISA does not permit exerting control on the power consumption of the multiplier. However, in the future, the compiler’s interface with the lower levels of the computing stack (i.e. the ISA) can be broadened to close such side channels. For instance, the compiler may be able to selectively enable techniques like Computational Blinking, thus enabling effective and efficient compiler-based side-channel defenses. Other approaches, such as Compiler-Assisted Masking or Virtual Secure Circuits, that allow the compiler to more accurately control the program’s power consumption, could be useful as well.

Manual Design of Compiler Transformations

We have manually designed the code transformations in the Raccoon compilers. However, given the variety of threat models and microarchitectures, it is improbable to expect a human designing each code transformation. We are currently working towards synthesizing these transformations automatically based on leakage models derived for different microarchitectures.

Accuracy of Leakage Models

The accuracy of Raccoon’s defenses depends on the precision of the leakage model, so if the leakage model is imprecise, the resulting defense could be vulnerable. However, the development of increasingly accurate models (using sound program analyses, “soundy” analyses, or statistical techniques) is an active area of research, and the results can be plugged into the compiler as the models become available.

System Calls, I/O Operations, and Library Calls

Finally, compilers cannot transform code that is not within the compiler’s purview. Specifically, I/O operations, system calls, and library calls are outside of the scope of the compiler. While I/O operations are perhaps impossible to hide, library operating systems can alleviate the problem of some system calls. Similarly, we can make the source code of libraries available to the compiler. For instance, we have previously transformed the math routines (sine, cosine, etc.) of the Musl C library using the Escort compiler.

Acknowledgments

This work was shaped in different ways by many people, including Professor Calvin Lin, Professor Mohit Tiwari, Jia Chen, Joshua Eversmann, Raymond Chee, Varun Adiga, Gregory McDonald, and Kasra Sadeghi. This research was funded in part by NSF Grants DRL-1441009, CNS-1314709, and CCF-1453806, C-FAR (one of the six SRC STARnet Centers sponsoredby MARCO and DARPA), and a gift from Qualcomm.

This article was prepared by Ashay Rane in his personal capacity. The opinions expressed in this article are the author’s own and do not reflect the view of the sponsors of this research.