My research has been possible due to grants from Intel, Qualcomm, the National Science Foundation (#0622780, DRL-1441009, and CNS-1314709), and CFAR (one of the six SRC STARnet Centers sponsored by MARCO and DARPA). I have also benefited from a great deal of technical guidance from teams at Intel.


Verification of high-performance cryptographic assembly code: High-performance cryptographic code is often generated using Perl scripts, C preprocessor macros, or custom interpreters, resulting in programs that are difficult to understand and verify for correctness and security. We present Vale, a framework that supports automated verification of high-performance crypto code. In addition to functional verification, Vale also supports verification of the absence of information leakage through digital side channels.

(Alphabetical order) Barry Bond, Chris Hawblitzel, Manos Kapritsos, K. Leino, Jacob Lorch, Bryan Parno, Ashay Rane, Srinath Setty, Laure Thompson, “Vale: Verifying High-Performance Cryptographic Assembly Code”, 26th USENIX Security Symposium, August 2017.


Compiler transformations that close digital side channels: Various point solutions exist that close a specific and limited set of side-channels. Individual point solutions not only cause high overheads when composed together, but they may also negate each others' defenses. Our solutions obfuscate programs at the source-level, thereby closing many (what we define as digital) side-channels. Our solution runs on standard modern x86 processors and is backward compatible.

Ashay Rane, Calvin Lin, Mohit Tiwari, “Secure, Precise, and Fast Floating-Point Operations on x86 Processors”, 25th USENIX Security Symposium, August 2016 (pdf).

Ashay Rane, Calvin Lin, Mohit Tiwari, “Raccoon: Closing Digital Side-Channels through Obfuscated Execution”, 24th USENIX Security Symposium, August 2015 (pdf).


Leveraging dynamic profiling information to improve code vectorization: Even after using extremely sophisticated static analyses, there are many cases when compilers fail to discover opportunities for code optimization. This work gathers relevant dynamic profiling metrics using a low-overhead profiling system and feeds these details back to the compiler in terms of specific pragmas and options. The study demonstrates that opportunities for additional vectorization are frequent and the performance gains obtained are substantial.

Ashay Rane, Rakesh Krishnaiyer, Chris Newburn, James Browne, Leo Fialho, Zakhar Matveev, “Unification of Static and Dynamic Analyses to Enable Vectorization”, 27th International Workshop on Languages and Compilers for Parallel Computing, September 2014 (pdf).


Data-structure-centric profiling for memory performance optimization: Diagnoses and selection of optimizations based only on measurements of the execution behavior of code segments are incomplete because they do not incorporate knowledge of memory access patterns and behaviors. This work presents a low-overhead tool (MACPO) that captures memory traces and computes metrics for the memory access behavior of source-level (C, C++, Fortran) data structures.

Ashay Rane, James Browne, “Enhancing performance optimization of multicore/multichip nodes with data structure metrics”, ACM transactions on Parallel Computing, May 2014 (pdf).

Ashay Rane, James Browne, “Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics”, Parallel Architectures and Compilation Techniques (PACT) 2012 - Nominated as one of the three best papers of the conference. (pdf).

Ashay Rane, James Browne, “Performance Optimization of Data Structures using Memory Access Characterization”, IEEE Cluster Conference 2011 (pdf).


Finding code-segments suited for execution on accelerators: Specific execution characteristics (such as making use of the wider vector instructions, minimal inter-thread conflicts, etc.) are favored by manycore chips like the Graphics Processing Units (GPUs) and the Intel Xeon Phi coprocessor. As a result, not all parts of an application will execute efficiently on such manycore chips. This work presents a process which can predict which code segments will execute efficiently on such special-purpose chips.

Ashay Rane, James Browne, Lars Koesterke, “PerfExpert and MACPO: Which code segments should (not) be ported to MIC?”, TACC-Intel High Performance Computing Symposium 2012 (pdf).

Ashay Rane, Saurabh Sardeshpande, James Browne, “Determining Code Segments that can Benefit from Execution on GPUs”, Supercomputing Conference (SC) 2011 (pdf).


Making performance diagnosis easier: PerfExpert is a tool that combines a simple user interface with a sophisticated analysis engine to detect probable core-, socket- and node-level performance bottlenecks. AutoSCOPE (which was later merged into PerfExpert) is the tool that generated recommendations in form of suggested compiler flags and code changes.

Olalekan Sopeju, Martin Burtscher, Ashay Rane, James Browne, “AutoSCOPE: Automatic Suggestions for Code Optimizations Using PerfExpert”, 2011 International Conference on Parallel and Distributed Processing Techniques and Applications, July 2011 (pdf).


Tuning hybrid MPI/OpenMP applications on multicore processors: Most data centers and high-performance computing clusters are designed using multi- (typically 4- or 8-) core processors connected over a network using Infiniband, Myrinet or other similar fabric. This presents a choice of the programming paradigm --- whether to use MPI for both intra-node and inter-node communication or to use OpenMP for intra-node communication and MPI for inter-node communication. This work studies the performance characteristics of various applications and comments on the suitability of two programming paradigms.

Ashay Rane, “A Study of the Hybrid Programming Paradigm on Multicore Architectures”, Masters Thesis, Arizona State University, November 2009 (pdf).

Ashay Rane, Dan Stanzione, “Experiences in tuning performance of hybrid MPI/OpenMP applications on quad-core systems”, Linux Clusters Institute, 2009 (pdf).