Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Selected Publications

For a full list of publications, view my Curriculum Vitae or Google Scholar profile.


Older Adults Learning Computer Programming: Motivations, Frustrations, and Design Opportunities. Philip J. Guo. ACM Conference on Human Factors in Computing Systems (CHI), May 2017. (to appear)

CodePilot: Scaffolding End-to-End Collaborative Software Development for Novice Programmers. Jeremy Warner and Philip J. Guo. ACM Conference on Human Factors in Computing Systems (CHI), May 2017. (to appear)


Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow. Denae Ford, Justin Smith, Philip J. Guo, Chris Parnin. ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), Nov 2016.
[Abstract | Blog post]

It is no secret that females engage less in programming fields than males. However, in online communities, such as Stack Overflow, this gender gap is even more extreme: only 5.8% of contributors are female. In this paper, we use a mixed-methods approach to identify contribution barriers females face in online communities. Through 22 semi-structured interviews with a spectrum of female users ranging from non-contributors to a top 100 ranked user of all time, we identified 14 barriers preventing them from contributing to Stack Overflow. We then conducted a survey with 1470 female and male developers to confirm which barriers are gender related or general problems for everyone. Females ranked five barriers significantly higher than males. A few of these include doubts in the level of expertise needed to contribute, feeling overwhelmed when competing with a large number of users, and limited awareness of site features. Still, there were other barriers that equally impacted all Stack Overflow users or affected particular groups, such as industry programmers. Finally, we describe several implications that may encourage increased participation in the Stack Overflow community across genders and other demographics.

Understanding Conversational Programmers: A Perspective from the Software Industry. Parmit K. Chilana, Rishabh Singh, Philip J. Guo. ACM Conference on Human Factors in Computing Systems (CHI), May 2016.
[Abstract | BibTeX]

Recent research suggests that some students learn to program with the goal of becoming conversational programmers: they want to develop programming literacy skills not to write code in the future but mainly to develop conversational skills and communicate better with developers and to improve their marketability. To investigate the existence of such a population of conversational programmers in practice, we surveyed professionals at a large multinational technology company who were not in software development roles. Based on 3151 survey responses from professionals who never or rarely wrote code, we found that a significant number of them (42.6%) had invested in learning programming on the job. While many of these respondents wanted to perform traditional end-user programming tasks (e.g., data analysis), we discovered that two top motivations for learning programming were to improve the efficacy of technical conversations and to acquire marketable skillsets. The main contribution of this work is in empirically establishing the existence and characteristics of conversational programmers in a large software development context.


Codeopticon: Real-Time, One-To-Many Human Tutoring for Computer Programming. Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), Nov 2015.
[Abstract | BibTeX | Blog post]

One-on-one tutoring from a human expert is an effective way for novices to overcome learning barriers in complex domains such as computer programming. But there are usually far fewer experts than learners. To enable a single expert to help more learners at once, we built Codeopticon, an interface that enables a programming tutor to monitor and chat with dozens of learners in real time. Each learner codes in a workspace that consists of an editor, compiler, and visual debugger. The tutor sees a real-time view of each learner's actions on a dashboard, with each learner's workspace summarized in a tile. At a glance, the tutor can see how learners are editing and debugging their code, and what errors they are encountering. The dashboard automatically reshuffles tiles so that the most active learners are always in the tutor's main field of view. When the tutor sees that a particular learner needs help, they can open an embedded chat window to start a one-on-one conversation. A user study showed that 8 first-time Codeopticon users successfully tutored anonymous learners from 54 countries in a naturalistic online setting. On average, in a 30-minute session, each tutor monitored 226 learners, started 12 conversations, exchanged 47 chats, and helped 2.4 learners.

Codechella: Multi-User Program Visualizations for Real-Time Tutoring and Collaborative Learning. Philip J. Guo, Jeffery White, Renan Zanelatto. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct 2015.
[Abstract | BibTeX | Blog post]

An effective way to learn computer programming is to sit side-by-side in front of the same computer with a tutor or peer, write code together, and then discuss what happens as the code executes. To bring this kind of in-person interaction to an online setting, we have developed Codechella, a multi-user Web-based program visualization system that enables multiple people to collaboratively write code together, explore an automatically-generated visualization of its execution state using multiple mouse cursors, and chat via an embedded text box. In the past nine months of live deployment on an educational website, people from 296 cities across 40 countries have started 299 Codechella sessions for both tutoring and collaborative learning. 57% of sessions connected participants from different cities. 69% of actions were visualization interactions, which indicates high engagement with program visualizations. Finally, participants showed signs of learning at the lower three levels of Bloom's taxonomy: remembering, understanding, and applying knowledge.

Codepourri: Creating Visual Coding Tutorials Using A Volunteer Crowd Of Learners. Mitchell Gordon and Philip J. Guo. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct 2015.
[Abstract | BibTeX | Blog post]

A popular way to learn is by studying written tutorials. However, tutorials for computer programming can be tedious to create, since a static text-based format cannot visualize what happens as code executes. We created a system called Codepourri that enables people to easily create visual coding tutorials by annotating steps in an automatically-generated program visualization. Using Codepourri, we developed a crowdsourcing workflow where learners who are visiting an educational website collectively create a tutorial by annotating individual steps and then voting on the best annotations. Since there are far more learners than experts, using learners as a crowd is a potentially more scalable way of creating tutorials. Our experiments with 4 expert judges and 101 learners adding 145 raw annotations to Python code show the learner crowd's annotations to be accurate, informative, and containing some insights that even experts missed.

Perceptions of Non-CS Majors in Intro Programming: The Rise of the Conversational Programmer. Parmit K. Chilana, Celena Alcock, Shruti Dembla, Anson Ho, Ada Hurst, Brett Armstrong, Philip J. Guo. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct 2015.
[Abstract | BibTeX]

Despite the enthusiasm and initiatives for making programming accessible to students outside Computer Science (CS), unfortunately, there are still many unanswered questions about how we should be teaching programming to engineers, scientists, artists or other non-CS majors. We present an in-depth case study of first-year management engineering students enrolled in a required introductory programming course at a large North American university. Based on an inductive analysis of one-on-one interviews, surveys, and weekly observations, we provide insights into students' motivations, career goals, perceptions of programming, and reactions to the Java and Processing languages. One of our key findings is that between the traditional classification of non-programmers vs. programmers, there exists a category of conversational programmers who do not necessarily want to be professional programmers or even end-user programmers, but want to learn programming so that they can speak in the "programmer's language" and improve their perceived job marketability in the software industry.

Toward a Domain-Specific Visual Discussion Forum for Learning Computer Programming: An Empirical Study of a Popular MOOC Forum. Joyce Zhu, Jeremy Warner, Mitchell Gordon, Jeffery White, Renan Zanelatto, Philip J. Guo. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct 2015.
[Abstract | BibTeX]

Online discussion forums are one of the most ubiquitous kinds of resources for people who are learning computer programming. However, their user interface -- a hierarchy of textual threads -- has not changed much in the past four decades. We argue that generic forum interfaces are cumbersome for learning programming and that there is a need for a domain-specific visual discussion forum for programming. We support this argument with an empirical study of all 5,377 forum threads in Introduction to Computer Science and Programming Using Python, a popular edX MOOC. Specifically, we investigated how forum participants were hampered by its text-based format. Most notably, people often wanted to discuss questions about dynamic execution state -- what happens "under the hood" as the computer runs code. We propose that a better forum for learning programming should be visual and domain-specific, integrating automatically-generated visualizations of execution state and enabling inline annotations of source code and output.

How High School, College, and Online Students Differentially Engage with an Interactive Digital Textbook. Jeremy Warner, John Doorenbos, Bradley N. Miller, Philip J. Guo. International Conference on Educational Data Mining (EDM), short paper, June 2015.
[Abstract | BibTeX]

Digital textbooks have been growing popular as a lower-cost and more interactive alternative to paper books. Despite the recent rise in adoption, little is known about how people use these resources. Prior studies have investigated student perceptions of digital textbooks in the classroom via interviews and surveys but have not quantified actual usage patterns. We present, to our knowledge, the first large-scale quantitative study of digital textbook usage. We mined 6.8 million log events from over 43,000 people interacting with How To Think Like a Computer Scientist, one of the most widely-used Web-based textbooks for learning computer programming. We compared engagement patterns among three populations: high school students, college students, and online website viewers. We discovered that people made extensive use of interactive components such as executing code and answering multiple-choice questions, engaged for longer when taking high school or college courses, and frequently viewed textbook sections out of order.

Wait-Learning: Leveraging Wait Time for Second Language Education. Carrie J. Cai, Philip J. Guo, James Glass, Robert C. Miller. ACM Conference on Human Factors in Computing Systems (CHI), April 2015.
[Abstract | BibTeX]

Competing priorities in daily life make it difficult for those with a casual interest in learning to set aside time for regular practice. In this paper, we explore wait-learning: leveraging brief moments of waiting during a person's existing conversations for second language vocabulary practice, even if the conversation happens in the native language. We present an augmented version of instant messaging, WaitChatter, that supports the notion of wait-learning by displaying contextually relevant foreign language vocabulary and micro-quizzes just-in-time while the user awaits a response from her conversant. Through a two week field study of WaitChatter with 20 people, we found that users were able to learn 57 new words on average during casual instant messaging. Furthermore, we found that users were most receptive to learning opportunities immediately after sending a chat message, and that this timing may be critical given user tendency to multi-task during waiting periods.

OverCode: Visualizing Variation in Student Solutions to Programming Problems at Scale. Elena L. Glassman, Jeremy Scott, Rishabh Singh, Philip J. Guo, Robert C. Miller. ACM Transactions on Computer-Human Interaction (TOCHI), 2015 (presented at CHI 2015)
[Abstract | BibTeX | Webpage]

In MOOCs, a single programming exercise may produce thousands of solutions from learners. Understanding solution variation is important for providing appropriate feedback to students at scale. The wide variation among these solutions can be a source of pedagogically valuable examples, and can be used to refine the autograder for the exercise by exposing corner cases. We present OverCode, a system for visualizing and exploring thousands of programming solutions. OverCode uses both static and dynamic analysis to cluster similar solutions, and lets teachers further filter and cluster solutions based on different criteria. We evaluated OverCode against a non-clustering baseline in a within-subjects study with 24 teaching assistants, and found that the OverCode interface allows teachers to more quickly develop a high-level view of students' understanding and misconceptions, and to provide feedback that is relevant to more students' solutions.


Data-Driven Interaction Techniques for Improving Navigation of Educational Videos. Juho Kim, Philip J. Guo, Carrie J. Cai, Shang-Wen (Daniel) Li, Krzysztof Z. Gajos, Robert C. Miller. ACM Symposium on User Interface Software and Technology (UIST), Oct 2014.
[Abstract | BibTeX | Webpage]

With an unprecedented scale of learners watching educational videos on online platforms such as MOOCs and YouTube, there is an opportunity to incorporate data generated from their interactions into the design of novel video interaction techniques. Interaction data has the potential to help not only instructors to improve their videos, but also to enrich the learning experience of educational video watchers. This paper explores the design space of data-driven interaction techniques for educational video navigation. We introduce a set of techniques that augment existing video interface widgets, including: a 2D video timeline with an embedded visualization of collective navigation traces; dynamic and non-linear timeline scrubbing; data-enhanced transcript search and keyword summary; automatic display of relevant still frames next to the video; and a visual summary representing points with high learner activity. To evaluate the feasibility of the techniques, we ran a laboratory user study with simulated learning tasks. Participants rated watching lecture videos with interaction data to be efficient and useful in completing the tasks. However, no significant differences were found in task performance, suggesting that interaction data may not always align with moment-by-moment information needs during the tasks.

A Direct Manipulation Language for Explaining Algorithms. Jeremy Scott, Philip J. Guo, Randall Davis. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), short paper, Jul 2014.
[Abstract | BibTeX]

Instructors typically explain algorithms in computer science by tracing their behavior, often on blackboards, sometimes with algorithm visualizations. Using blackboards can be tedious because they do not facilitate manipulation of the drawing, while visualizations often operate at the wrong level of abstraction or must be laboriously hand-coded for each algorithm. In response, we present a direct manipulation (DM) language for explaining algorithms by manipulating visualized data structures. The language maps DM gestures onto primitive program behaviors that occur in commonly taught algorithms. We performed an initial evaluation of the DM language on teaching assistants of an undergraduate algorithms class, who found the language easier to use and more helpful for explaining algorithms than a standard drawing application (GIMP).

Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos. Juho Kim, Phu Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, Krzysztof Z. Gajos. ACM Conference on Human Factors in Computing Systems (CHI), April 2014.
(Honorable Mention Paper)
[Abstract | BibTeX]

Millions of learners today use how-to videos to master new skills in a variety of domains. But browsing such videos is often tedious and inefficient because video player interfaces are not optimized for the unique step-by-step structure of such videos. This research aims to improve the learning experience of existing how-to videos with step-by-step annotations.

We first performed a formative study to verify that annotations are actually useful to learners. We created ToolScape, an interactive video player that displays step descriptions and intermediate result thumbnails in the video timeline. Learners in our study performed better and gained more self-efficacy using ToolScape versus a traditional video player.

To add the needed step annotations to existing how-to videos at scale, we introduce a novel crowdsourcing workflow. It extracts step-by-step structure from an existing video, including step times, descriptions, and before and after images. We introduce the Find-Verify-Expand design pattern for temporal and visual annotation, which applies clustering, text processing, and visual analysis algorithms to merge crowd output. The workflow does not rely on domain-specific customization, works on top of existing videos, and recruits untrained crowd workers. We evaluated the workflow with Mechanical Turk, using 75 cooking, makeup, and Photoshop videos on YouTube. Results show that our workflow can extract steps with a quality comparable to that of trained annotators across all three domains with 77% precision and 81% recall.

Demographic Differences in How Students Navigate Through MOOCs. Philip J. Guo and Katharina Reinecke. ACM Conference on Learning at Scale, March 2014.
[Abstract | BibTeX | Blog post]

The current generation of Massive Open Online Courses (MOOCs) attract a diverse student audience from all age groups and over 196 countries around the world. Researchers, educators, and the general public have recently become interested in how the learning experience in MOOCs differs from that in traditional courses. A major component of the learning experience is how students navigate through course content.

This paper presents an empirical study of how students navigate through MOOCs, and is, to our knowledge, the first to investigate how navigation strategies differ by demographics such as age and country of origin. We performed data analysis on the activities of 140,546 students in four edX MOOCs and found that certificate earners skip on average 22% of the course content, that they frequently employ non-linear navigation by jumping backward to earlier lecture sequences, and that older students and those from countries with lower student-teacher ratios are more comprehensive and non-linear when navigating through the course.

From these findings, we suggest design recommendations such as for MOOC platforms to develop more detailed forms of certification that incentivize students to deeply engage with the content rather than just doing the minimum necessary to earn a passing grade.

How Video Production Affects Student Engagement: An Empirical Study of MOOC Videos. Philip J. Guo, Juho Kim, Rob Rubin. ACM Conference on Learning at Scale, March 2014.
[Abstract | BibTeX | Blog post]

Videos are a widely-used kind of resource for online learning. This paper presents an empirical study of how video production decisions affect student engagement in online educational videos. To our knowledge, ours is the largest-scale study of video engagement to date, using data from 6.9 million video watching sessions across four courses on the edX MOOC platform. We measure engagement by how long students are watching each video, and whether they attempt to answer post-video assessment problems.

Our main findings are that shorter videos are much more engaging, that informal talking-head videos are more engaging, that Khan-style tablet drawings are more engaging, that even high-quality pre-recorded classroom lectures might not make for engaging online videos, and that students engage differently with lecture and tutorial videos.

Based upon these quantitative findings and qualitative insights from interviews with edX staff, we developed a set of recommendations to help instructors and video producers take better advantage of the online video format.

Understanding In-Video Dropouts and Interaction Peaks in Online Lecture Videos. Juho Kim, Philip J. Guo, Daniel T. Seaton, Piotr Mitros, Krzysztof Z. Gajos, Robert C. Miller. ACM Conference on Learning at Scale, March 2014.
[Abstract | BibTeX]

With thousands of learners watching the same online lecture videos, analyzing video watching patterns provides a unique opportunity to understand how students learn with videos. This paper reports a large-scale analysis of in-video dropout and peaks in viewership and student activity, using second-by-second user interaction data from 862 videos in four Massive Open Online Courses (MOOCs) on edX. We find higher dropout rates in longer videos, re-watching sessions (vs first-time), and tutorials (vs lectures). Peaks in re-watching sessions and play events indicate points of interest and confusion. Results show that tutorials (vs lectures) and re-watching sessions (vs first-time) lead to more frequent and sharper peaks. In attempting to reason why peaks occur by sampling 80 videos, we observe that 61% of the peaks accompany visual transitions in the video, e.g., a slide view to a classroom view. Based on this observation, we identify five student activity patterns that can explain peaks: starting from the beginning of a new material, returning to missed content, following a tutorial step, replaying a brief segment, and repeating a non-visual explanation. Our analysis has design implications for video authoring, editing, and interface design, providing a richer understanding of video learning on MOOCs.


Online Python Tutor: Embeddable Web-Based Program Visualization for CS Education. Philip J. Guo. ACM Technical Symposium on Computer Science Education (SIGCSE), March 2013.
[Abstract | BibTeX | Webpage]

This paper presents Online Python Tutor, a web-based program visualization tool for Python, which is becoming a popular language for teaching introductory CS courses. Using this tool, teachers and students can write Python programs directly in the web browser (without installing any plugins), step forwards and backwards through execution to view the run-time state of data structures, and share their program visualizations on the web.

In the past three years, over 200,000 people have used Online Python Tutor to visualize their programs. In addition, instructors in a dozen universities such as UC Berkeley, MIT, the University of Washington, and the University of Waterloo have used it in their CS1 courses. Finally, Online Python Tutor visualizations have been embedded within three web-based digital Python textbook projects, which collectively attract around 16,000 viewers per month and are being used in at least 25 universities. Online Python Tutor is free and open source software, available at


Characterizing and Predicting Which Bugs Get Reopened. Thomas Zimmermann, Nachiappan Nagappan, Philip J. Guo, Brendan Murphy. ACM/IEEE International Conference on Software Engineering (ICSE), Software Engineering In Practice (SEIP) track, June 2012.
(Best Paper Award)
[Abstract | BibTeX]

Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understanding the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to describe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.

Burrito: Wrapping Your Lab Notebook in Computational Infrastructure. Philip J. Guo and Margo Seltzer. USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2012.
[Abstract | BibTeX | Webpage]

Researchers in fields such as bioinformatics, CS, finance, and applied math have trouble managing the numerous code and data files generated by their computational experiments, comparing the results of trials executed with different parameters, and keeping up-to-date notes on what they learned from past successes and failures.

We created a Linux-based system called Burrito that automates aspects of this tedious experiment organization and notetaking process, thus freeing researchers to focus on more substantive work. Burrito automatically captures a researcher's computational activities and provides user interfaces to annotate the captured provenance with notes and then make queries such as, "Which script versions and command-line parameters generated the output graph that this note refers to?"

Software Tools to Facilitate Research Programming. Philip J. Guo. Ph.D. dissertation, Department of Computer Science, Stanford University, May 2012.
[Abstract | BibTeX]

Research programming is a type of programming activity where the goal is to write computer programs to obtain insights from data. Millions of professionals in fields ranging from science, engineering, business, finance, public policy, and journalism, as well as numerous students and computer hobbyists, all perform research programming on a daily basis.

My thesis is that by understanding the unique challenges faced during research programming, it becomes possible to apply techniques from dynamic program analysis, mixed-initiative recommendation systems, and OS-level tracing to make research programmers more productive.

This dissertation characterizes the research programming process, describes typical challenges faced by research programmers, and presents five software tools that I have developed to address some key challenges. 1.) Proactive Wrangler is an interactive graphical tool that helps research programmers reformat and clean data prior to analysis. 2.) IncPy is a Python interpreter that speeds up the data analysis scripting cycle and helps programmers manage code and data dependencies. 3.) SlopPy is a Python interpreter that automatically makes existing scripts error-tolerant, thereby also speeding up the data analysis scripting cycle. 4.) Burrito is a Linux-based system that helps programmers organize, annotate, and recall past insights about their experiments. 5.) CDE is a software packaging tool that makes it easy to deploy, archive, and share research code.

Taken together, these five tools enable research programmers to iterate and potentially discover insights faster by offloading the burdens of data management and provenance to the computer.

HAMPI: A Solver for Word Equations over Strings, Regular Expressions and Context-free Grammars. Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst. ACM Transactions of Software Engineering Methodology (TOSEM), 2012.
[Abstract | BibTeX]

Many automatic testing, analysis, and verification techniques for programs can be effectively reduced to a constraint-generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable software reliability tools. The increasing efficiency of off-the-shelf constraint solvers makes this approach even more compelling. However, there are few effective and sufficiently expressive off-the-shelf solvers for string constraints generated by analysis of string-manipulating programs, so researchers end up implementing their own ad-hoc solvers.

To fulfill this need, we designed and implemented HAMPI, a solver for string constraints over bounded string variables. Users of HAMPI specify constraints using regular expressions, context-free grammars, equality between string terms, and typical string operations such as concatenation and substring extraction. HAMPI then finds a string that satisfies all the constraints or reports that the constraints are unsatisfiable.

We demonstrate HAMPI's expressiveness and efficiency by applying it to program analysis and automated testing. We used HAMPI in static and dynamic analyses for finding SQL injection vulnerabilities in Web applications with hundreds of thousands of lines of code. We also used HAMPI in the context of automated bug finding in C programs using dynamic systematic testing (also known as concolic testing). We then compared HAMPI with another string solver, CFGAnalyzer, and show that HAMPI is several times faster. HAMPI's source code, documentation, and experimental data are available at


CDE: Run Any Linux Application On-Demand Without Installation. Philip J. Guo. USENIX Large Installation System Administration Conference (LISA), December 2011.
[Abstract | BibTeX | Webpage]

There is a huge ecosystem of free software for Linux, but since each Linux distribution (distro) contains a different set of pre-installed shared libraries, filesystem layout conventions, and other environmental state, it is difficult to create and distribute software that works without hassle across all distros. Online forums and mailing lists are filled with discussions of users' troubles with compiling, installing, and configuring Linux software and their myriad of dependencies. To address this ubiquitous problem, we have created an open-source tool called CDE that automatically packages up the Code, Data, and Environment required to run a set of x86-Linux programs on other x86-Linux machines. Creating a CDE package is as simple as running the target application under CDE's monitoring, and executing a CDE package requires no installation, configuration, or root permissions. CDE enables Linux users to instantly run any application on-demand without encountering "dependency hell".

Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts. Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, Jeffrey Heer. ACM Symposium on User Interface Software and Technology (UIST), October 2011.
[Abstract | BibTeX]

Analysts regularly wrangle data into a form suitable for computational tools through a tedious process that delays more substantive analysis. While interactive tools can assist data transformation, analysts must still conceptualize the desired output state, formulate a transformation strategy, and specify complex transforms. We present a model to proactively suggest data transforms which map input data to a relational format expected by analysis tools. To guide search through the space of transforms, we propose a metric that scores tables according to type homogeneity, sparsity and the presence of delimiters. When compared to "ideal" hand-crafted transformations, our model suggests over half of the needed steps; in these cases the top-ranked suggestion is preferred 77% of the time. User study results indicate that suggestions produced by our model can assist analysts' transformation tasks, but that users do not always value proactive assistance, instead preferring to maintain the initiative. We discuss some implications of these results for mixed-initiative interfaces.

Using Automatic Persistent Memoization to Facilitate Data Analysis Scripting. Philip J. Guo and Dawson Engler. International Symposium on Software Testing and Analysis (ISSTA), July 2011.
[Abstract | BibTeX | Webpage]

Programmers across a wide range of disciplines (e.g., bioinformatics, neuroscience, econometrics, finance, data mining, information retrieval, machine learning) write scripts to parse, transform, process, and extract insights from data. To speed up iteration times, they split their analyses into stages and write extra code to save the intermediate results of each stage to files so that those results do not have to be re-computed in every subsequent run. As they explore and refine hypotheses, their scripts often create and process lots of intermediate data files. They need to properly manage the myriad of dependencies between their code and data files, or else their analyses will produce incorrect results.

To enable programmers to iterate quickly without needing to manage intermediate data files, we added a set of dynamic analyses to the programming language interpreter so that it automatically memoizes (caches) the results of long-running pure function calls to disk, manages dependencies between code and on-disk data, and later re-uses memoized results, rather than re-executing those functions, when guaranteed safe to do so. We created an implementation for Python and show how it enables programmers to iterate faster on their data analysis scripts while writing less code and not having to manage dependencies between their code and datasets.

Sloppy Python: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-Hoc Data Processing Scripts. Philip J. Guo. International Workshop on Dynamic Analysis (WODA), July 2011.
[Abstract | BibTeX | Webpage]

Programmers and data analysts get frustrated when their long-running data processing scripts crash without producing results, due to either bugs in their code or inconsistencies in data sources. To alleviate this frustration, we developed a dynamic analysis technique that guarantees scripts will never crash: It converts all uncaught exceptions into special NA (Not Available) objects and continues executing rather than crashing. Thus, imperfect scripts will run to completion and produce partial results and an error log, which is more informative than simply crashing with no results. We implemented our technique as a "Sloppy" Python interpreter that automatically adds error tolerance to existing scripts without any programmer effort or run-time slowdown.

CDE: Using System Call Interposition to Automatically Create Portable Software Packages. Philip J. Guo and Dawson Engler. USENIX Annual Technical Conference, short paper, June 2011.
[Abstract | BibTeX]

It can be painfully hard to take software that runs on one person's machine and get it to run on another machine. Online forums and mailing lists are filled with discussions of users' troubles with compiling, installing, and configuring software and their myriad of dependencies. To eliminate this dependency problem, we created a system called CDE that uses system call interposition to monitor the execution of x86-Linux programs and package up the Code, Data, and Environment required to run them on other x86-Linux machines. Creating a CDE package is completely automatic, and running programs within a package requires no installation, configuration, or root permissions. Hundreds of people in both academia and industry have used CDE to distribute software, demo prototypes, make their scientific experiments reproducible, run software natively on older Linux distributions, and deploy experiments to compute clusters.

"Not My Bug!" and Other Reasons for Software Bug Report Reassignments. Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, Brendan Murphy. ACM Conference on Computer Supported Cooperative Work (CSCW), March 2011.
[Abstract | BibTeX]

Bug reporting/fixing is an important social part of the software development process. The bug-fixing process inherently has strong inter-personal dynamics at play, especially in how to find the optimal person to handle a bug report. Bug report reassignments, which are a common part of the bug-fixing process, have rarely been studied.

In this paper, we present a large-scale quantitative and qualitative analysis of the bug reassignment process in the Microsoft Windows Vista operating system project. We quantify social interactions in terms of both useful and harmful reassignments. For instance, we found that reassignments are useful to determine the best person to fix a bug, contrary to the popular opinion that reassignments are always harmful. We categorized five primary reasons for reassignments: finding the root cause, determining ownership, poor bug report quality, hard to determine proper fix, and workload balancing. We then use these findings to make recommendations for the design of more socially-aware bug tracking systems that can overcome some of the inefficiencies we observed in our study.


Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, Brendan Murphy. IEEE International Conference on Software Engineering (ICSE), May 2010.
[Abstract | BibTeX]

We performed an empirical study to characterize factors that affect which bugs get fixed in Windows Vista and Windows 7, focusing on factors related to bug report edits and relationships between people involved in handling the bug. We found that bugs reported by people with better reputations were more likely to get fixed, as were bugs handled by people on the same team and working in geographical proximity. We reinforce these quantitative results with survey feedback from 358 Microsoft employees who were involved in Windows bugs. Survey respondents also mentioned additional qualitative influences on bug fixing, such as the importance of seniority and interpersonal skills of the bug reporter.

Informed by these findings, we built a statistical model to predict the probability that a new bug will be fixed (the first known one, to the best of our knowledge). We trained it on Windows Vista bugs and got a precision of 68% and recall of 64% when predicting Windows 7 bug fixes. Engineers could use such a model to prioritize bugs during triage, to estimate developer workloads, and to decide which bugs should be closed or migrated to future product versions.


HAMPI: A Solver for String Constraints. Adam Kiezun, Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst. International Symposium on Software Testing and Analysis (ISSTA), July 2009.
(ACM SIGSOFT Distinguished Paper Award)
[Abstract | BibTeX]

Many automatic testing, analysis, and verification techniques for programs can be effectively reduced to a constraint-generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable tools. The increasing efficiency of off-the-shelf constraint solvers makes this approach even more compelling. However, there are few, if any, effective and sufficiently expressive off-the-shelf solvers for string constraints generated by analysis techniques for string-manipulating programs.

We designed and implemented HAMPI, a solver for string constraints over bounded string variables. HAMPI constraints express membership in regular languages and bounded context-free languages. HAMPI constraints may contain context-free-language definitions, regular-language definitions and operations, and the membership predicate. Given a set of constraints, HAMPI outputs a string that satisfies all the constraints, or reports that the constraints are unsatisfiable.

HAMPI is expressive and efficient, and can be successfully applied to testing and analysis of real programs. Our experiments use HAMPI in: static and dynamic analyses for finding SQL injection vulnerabilities in Web applications; automated bug finding in C programs using systematic testing; and compare HAMPI with another string solver. HAMPI's source code, documentation, and the experimental data are available at

Linux Kernel Developer Responses to Static Analysis Bug Reports. Philip J. Guo and Dawson Engler. USENIX Annual Technical Conference, short paper, June 2009.
[Abstract | BibTeX]

We present a study of how Linux kernel developers respond to bug reports issued by a static analysis tool. We found that developers prefer to triage reports in younger, smaller, and more actively-maintained files, first address easy-to-fix bugs and defer difficult (but possibly critical) bugs, and triage bugs in batches rather than individually. Also, although automated tools cannot find many types of bugs, they can be effective at directing developers attentions towards parts of the codebase that contain up to 3X more user-reported bugs.

Our insights into developer attitudes towards static analysis tools allow us to make suggestions for improving their usability and effectiveness. We feel that it could be effective to run static analysis tools continuously while programming and before committing code, to rank reports so that those most likely to be triaged are shown to developers first, to show the easiest reports to new developers, to perform deeper analysis on more actively-maintained code, and to use reports as indirect indicators of code quality and importance.

Automatic Creation of SQL Injection and Cross-site Scripting Attacks. Adam Kiezun, Philip J. Guo, Karthick Jayaraman, Michael D. Ernst. IEEE International Conference on Software Engineering (ICSE), May 2009.
[Abstract | BibTeX]

We present a technique for finding security vulnerabilities in Web applications. SQL Injection (SQLI) and cross-site scripting (XSS) attacks are widespread forms of attack in which the attacker crafts the input to the application to access or modify user data and execute malicious code. In the most serious attacks (called second-order, or persistent, XSS), an attacker can corrupt a database so as to cause subsequent users to execute malicious code.

This paper presents an automatic technique for creating inputs that expose SQLI and XSS vulnerabilities. The technique generates sample inputs, symbolically tracks taints through execution (including through database accesses), and mutates the inputs to produce concrete exploits. Ours is the first analysis of which we are aware that precisely addresses second-order XSS attacks.

Our technique creates real attack vectors, has few false positives, incurs no runtime overhead for the deployed application, works without requiring modification of application code, and handles dynamic programming-language constructs. We implemented the technique for PHP, in a tool Ardilla. We evaluated Ardilla on five PHP applications and found 68 previously unknown vulnerabilities (23 SQLI, 33 first-order XSS, and 12 second-order XSS).

Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, Scott R. Klemmer. ACM Conference on Human Factors in Computing Systems (CHI), April 2009.
(Honorable Mention Paper)
[Abstract | BibTeX]

This paper investigates the role of online resources in problem solving. We look specifically at how programmers—an exemplar form of knowledge workers—opportunistically interleave Web foraging, learning, and writing code. We describe two studies of how programmers use online resources. The first, conducted in the lab, observed participants Web use while building an online chat room. We found that programmers leverage online resources with a range of intentions: They engage in just-in-time learning of new skills and approaches, clarify and extend their existing knowledge, and remind themselves of details deemed not worth remembering. The results also suggest that queries for different purposes have different styles and durations. Do programmers' queries "in the wild" have the same range of intentions, or is this result an artifact of the particular lab setting? We analyzed a month of queries to an online programming portal, examining the lexical structure, refinements made, and result pages visited. Here we also saw traits that suggest the Web is being used for learning and reminding. These results contribute to a theory of online resource usage in programming, and suggest opportunities for tools to facilitate online knowledge work.


The Daikon system for dynamic detection of likely invariants. Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, Chen Xiao. Science of Computer Programming, 2007.
[Abstract | BibTeX]

Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often used in assert statements, documentation, and formal specifications. Examples include being constant (x = a), non-zero (x != 0), being in a range (a <= x <= b), linear relationships (y = ax+b), ordering (x <= y), functions from a library (x = fn(y)), containment (x E y), sortedness (x is sorted), and many more. Users can extend Daikon to check for additional invariants.

Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions. Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data. Daikon can detect invariants in C, C++, Java, and Perl programs, and in record-structured data sources; it is easy to extend Daikon to other applications.

Invariants can be useful in program understanding and a host of other applications. Daikon's output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem proving, repairing inconsistent data structures, and checking the validity of data streams, among other tasks.

Daikon is freely available in source and binary form, along with extensive documentation, at


Dynamic Inference of Abstract Types. Philip J. Guo, Jeff H. Perkins, Stephen McCamant, Michael D. Ernst. International Symposium on Software Testing and Analysis (ISSTA), July 2006.
[Abstract | BibTeX]

An abstract type groups variables that are used for related purposes in a program. We describe a dynamic unification-based analysis for inferring abstract types. Initially, each run-time value gets a unique abstract type. A run-time interaction among values indicates that they have the same abstract type, so their abstract types are unified. Also at run time, abstract types for variables are accumulated from abstract types for values. The notion of interaction may be customized, permitting the analysis to compute finer or coarser abstract types; these different notions of abstract type are useful for different tasks. We have implemented the analysis for compiled x86 binaries and for Java bytecodes. Our experiments indicate that the inferred abstract types are useful for program comprehension, improve both the results and the run time of a follow-on program analysis, and are more precise than the output of a comparable static analysis, without suffering from overfitting.

Automatic Inference and Enforcement of Data Structure Consistency Specifications. Brian Demsky, Michael D. Ernst, Philip J. Guo, Stephen McCamant, Jeff H. Perkins, Martin Rinard. International Symposium on Software Testing and Analysis (ISSTA), July 2006.
[Abstract | BibTeX]

Corrupt data structures are an important cause of unacceptable program execution. Data structure repair (which eliminates inconsistencies by updating corrupt data structures to conform to consistency constraints) promises to enable many programs to continue to execute acceptably in the face of otherwise fatal data structure corruption errors. A key issue is obtaining an accurate and comprehensive data structure consistency specification.

We present a new technique for obtaining data structure consistency specifications for data structure repair. Instead of requiring the developer to manually generate such specifications, our approach automatically generates candidate data structure consistency properties using the Daikon invariant detection tool. The developer then reviews these properties, potentially rejecting or generalizing overly specific properties to obtain a specification suitable for automatic enforcement via data structure repair.

We have implemented this approach and applied it to three sizable benchmark programs: CTAS (an air-traffic control system), BIND (a widely-used Internet name server) and Freeciv (an interactive game). Our results indicate that (1) automatic constraint generation produces constraints that enable programs to execute successfully through data structure consistency errors, (2) compared to manual specification, automatic generation can produce more comprehensive sets of constraints that cover a larger range of data structure consistency properties, and (3) reviewing the properties is relatively straightforward and requires substantially less programmer effort than manual generation, primarily because it reduces the need to examine the program text to understand its operation and extract the relevant consistency constraints. Moreover, when evaluated by a hostile third party "Red Team" contracted to evaluate the effectiveness of the technique, our data structure inference and enforcement tools successfully prevented several otherwise fatal attacks.

A Scalable Mixed-Level Approach to Dynamic Analysis of C and C++ Programs. Philip J. Guo. Master of Engineering (M.Eng.) thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 2006.
(MIT Charles and Jennifer Johnson Award for Outstanding Computer Science M.Eng. Thesis)
[Abstract | BibTeX | Webpage]

This thesis addresses the difficult task of constructing robust and scalable dynamic program analysis tools for programs written in memory-unsafe languages such as C and C++, especially those that are interested in observing the contents of data structures at run time. In this thesis, I first introduce my novel mixed-level approach to dynamic analysis, which combines the advantages of both source- and binary-based approaches. Second, I present a tool framework that embodies the mixed-level approach. This framework provides memory safety guarantees, allows tools built upon it to access rich source- and binary-level information simultaneously at run time, and enables tools to scale to large, real-world C and C++ programs on the order of millions of lines of code. Third, I present two dynamic analysis tools built upon my framework—one for performing value profiling and the other for performing dynamic inference of abstract types—and describe how they far surpass previous analyses in terms of scalability, robustness, and applicability. Lastly, I present several case studies demonstrating how these tools aid both humans and automated tools in several program analysis tasks: improving human understanding of unfamiliar code, invariant detection, and data structure repair.