If you are interested in working with me on a research project this summer, see my Summer MAP Opportunities page. I will accept applications for summer research positions until February 24th.

DoubleTake: Evidence-Based Dynamic Analysis ICSE 2016
Tongping Liu, Charlie Curtsinger, and Emery D. Berger

Dynamic analysis can be helpful for debugging, but is often too expensive to use in deployed applications. We introduce evidence-based dynamic analysis, an approach that enables extremely lightweight analyses for an important class of errors: those that can be forced to leave evidence of their existence. Evidence-based dynamic analysis lets execution proceed at full speed until the end of an epoch. It then examines program state to find evidence that an error occurred at some time during that epoch. If so, execution is rolled back and re-execution proceeds with instrumentation activated to pinpoint the error. We present DoubleTake, a prototype evidence-based dynamic analysis framework. We demonstrate its generality by building analyses to find buffer overflows, memory use-after-free errors, and memory leaks. DoubleTake is precise and efficient: its buffer overflow analysis runs with just 2% overhead on average, making it the fastest such system to date.

Coz: Finding Code that Counts with Causal Profiling SOSP 2015 (Best Paper)
Charlie Curtsinger and Emery D. Berger

Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities.

This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus “virtually” speeding it up.

We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.

Stabilizer: Statistically Sound Performance Evaluation ASPLOS 2013
Charlie Curtsinger and Emery D. Berger

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

Probabilistic Timing Analysis on Conventional Cache Designs DATE 2013
Leonidas Kosmidis, Charlie Curtsinger, Eduardo Quiñones, Jaume Abella, Emery D. Berger, and Francisco J. Cazorla

Probabilistic timing analysis (PTA), a promising alternative to traditional worst-case execution time (WCET) analyses, enables pairing time bounds (named probabilistic WCET or pWCET) with an exceedance probability (e.g., 10-16), resulting in far tighter bounds than conventional analyses. However, the applicability of PTA has been limited because of its dependence on relatively exotic hardware: fully-associative caches using random replacement. This paper extends the applicability of PTA to conventional cache designs via a software-only approach. We show that, by using a combination of compiler techniques and runtime system support to randomise the memory layout of both code and data, conventional caches behave as fully-associative ones with random replacement.

AutoMan: A Platform for Integrating Human-Based and Digital Computation OOPSLA 2012, CACM Research Highlight
Daniel W. Barowy, Charlie Curtsinger, Emery D. Berger, and Andrew McGregor

Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power at an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling more human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.

This paper introduces AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction lets AutoMan programmers focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

DThreads: Efficient and Deterministic Multithreading SOSP 2011
Tongping Liu, Charlie Curtsinger, and Emery D. Berger

Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads.

This paper presents DThreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. DThreads enforces determinism in the face of data races and deadlocks. DThreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, DThreads has the additional benefit of eliminating false sharing. Experimental results show that DThreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.

Nofus: "Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" MSR Tech Report
Scott F. Kaplan, Benjamin Livshits, Benjamin G. Zorn, and Charlie Curtsinger

Obfuscation is applied to large quantities of benign and malicious JavaScript throughout the web. In situations where JavaScript source code is being submitted for widespread use, such as in a gallery of browser extensions (e.g., Firefox), it is valuable to require that the code submitted is not obfuscated and to check for that property. In this paper, we describe Nofus, a static, automatic classifier that distinguishes obfuscated and non-obfuscated JavaScript with high precision. Using a collection of examples of both obfuscated and non-obfuscated JavaScript, we train Nofus to distinguish between the two and show that the classifier has both a low false positive rate (about 1%) and low false negative rate (about 5%).

Applying Nofus to collections of deployed JavaScript, we show it correctly identifies obfuscated JavaScript files from Alexa top 50 websites. While prior work conflates obfuscation with maliciousness (assuming that detecting obfuscation implies maliciousness), we show that the correlation is weak. Yes, much malware is hidden using obfuscation, but so is benign JavaScript. Further, applying Nofus to known JavaScript malware, we show our classifier finds 15% of the files are un-obfuscated, showing that not all malware is obfuscated.

Zozzle: Low-Overhead Mostly Static JavaScript Malware Detection USENIX Security 2011
Charlie Curtsinger, Benjamin Livshits, Benjamin G. Zorn, and Christian Seifert

JavaScript malware-based attacks account for a large fraction of successful mass-scale exploitation happening today. Attackers like JavaScript-based attacks because they can be mounted against an unsuspecting user visiting a seemingly innocent web page. While several techniques for addressing these types of exploits have been proposed, in-browser adoption has been slow, in part because of the performance overhead these methods incur.

In this paper, we propose Zozzle, a low-overhead solution for detecting and preventing JavaScript malware that is fast enough to be deployed in the browser. Our approach uses Bayesian classification of hierarchical features of the JavaScript abstract syntax tree to identify syntax elements that are highly predictive of malware. Our experimental evaluation shows that Zozzle is able to detect JavaScript malware through mostly static code analysis effectively. Zozzle has an extremely low false positive rate of 0.0003%, which is less than one in a quarter million. Despite this high accuracy, the Zozzle classifier is fast, with a throughput of over one megabyte of JavaScript code per second.