Computational researchers have trouble managing the numerous code and
data files generated by their experiments, comparing the results of
trials executed with different parameters, and keeping up-to-date notes
on what they learned from past successes and failures.
Burrito is a Linux-based system that automates as much of this experiment
organization and notetaking process as possible, thus freeing
researchers to focus on their actual work.
Burrito automatically captures a researcher's computational activities
with no perceptible run-time slowdown. It provides user interfaces to
annotate the captured activities with notes and then make queries such
as, "Which script versions and command-line parameters generated the
output graph that this note refers to?"
The process of hacking on experimental code is messy:
You constantly adjust your code and run it with different
parameters, which generates a ton of output data files.
Version control systems are too cumbersome for your
rapidly-changing workflow. Instead, to keep track of what each file
represents, you create weird filenames to encode metadata such as
command-line parameters or version numbers (see screenshot on the
It's hard to assess how changes to your source code and execution
parameters led to corresponding changes in output data files.
You read documentation web pages, PDF files, sample code, and other
resources while you hack, so it's hard to remember which resources
influenced you to make specific edits.
You try to be disciplined about keeping notes, but you often forget
what exact code or data your notes refer to, since they change so
Researchers' directories are filled with
weirdly-named files representing variants of experiment outputs.
Burrito System Overview
Burrito solves the above problems by wrapping a layer of
computational infrastructure around your normal Linux work
environment. It consists of eight main components:
A versioning filesystem that
automatically tracks all edits to all of your files and allows
you to view old file versions. This eliminates the need to use version
control systems or weird file naming conventions.
A tracer that
records the origin (provenance) of files, telling you which
program invocations created or read from each file, and what their
A tracer that records your GUI interactions, such as which
application windows you were viewing at specific times.
A set of plugins that record your activities within specific
applications, such as which MATLAB commands you were running and which
web pages you were visiting.
A real-time Activity Feed that allows
you to view and take notes on your recent activities.
An Activity Context Viewer that displays what
else you were reading and writing when hacking on some part of your
A Computational Context Viewer that shows how
changes to your source code and execution parameters affected your
experiment's output files.
A Lab Notebook Generator that creates an HTML
summary of your activities over a given time period.
This application is a sidebar on the left portion of your Linux
desktop background. It displays a real-time stream of your actions and
allows you to annotate any action with notes. Since all notes are
linked with their original context, you no longer need to worry about
creating, organizing, and locating a mess of notes files.
As the screenshot on the right shows, the Activity Feed's UI is
inspired by the Facebook News Feed: New events appear at the top of the
feed and push down older ones. It currently displays six types of
A Bash command event shows a group of Bash shell commands
executed in the same directory. You can click on any command to copy it
to the clipboard and paste it into a terminal to re-execute.
A website visit event shows a set of web pages that you
just viewed. You can click on any page title to open its link in a web
A file modification event shows a group of files modified
by a particular process. For example, saving a source code file in a
text editor will create a new file modification event, as does executing
a script to generate an output data file.
A digital sketch event shows a thumbnail view of a sketch
that you've just drawn using, say, a Wacom pen tablet.
You can create a status update event by entering text in
the status text box and pressing the "Post" button. This is the main
way for you to describe what you're working on at a given moment, which
helps place other events in context.
You can create a checkpoint event by clicking on either the
"Happy Face" or "Sad Face" button and then entering a note in the pop-up
text box describing why you're happy or sad about the current state of
your experiment. A "happy checkpoint" is like making a commit in
a version control system, and a "sad checkpoint" is like filing a bug
report in a bug tracking system.
Screenshot showing the Activity Feed on the left,
and a visual diff of two file versions on the right.
You can click on a file modification event in the feed and select
the following actions:
Open the version of the chosen file either before or after
Diff multiple old versions of a chosen file by launching the
Meld visual diff tool (see
screenshot on the left).
Revert the file to the chosen version.
Watch the file for changes. The Activity Feed will report a
warning if a future modification causes that file to differ from the
chosen version. This action creates a simple regression test.
View the context surrounding edits to the chosen file (see
Activity Context Viewer
This application enables you to answer questions such as, "When I
left work last week, I was editing this part of my script and had a
collection of reference materials open ... what were they?"
You launch this application with a target text file (e.g., source
code) as its argument. The GUI is a table view where each row
represents a "version" of the chosen file (determined based on
heuristics), displaying these four columns of data:
Diffs of this file between the previous and current
Resources read while working on this version, including
which web pages, documents, and source code you viewed.
Resources written while working on this version, including
other code that you edited, and checkpoints, status updates, and
digital sketches that you created.
Annotations that you can add to this file version.
Screenshot of the Computational Context Viewer GUI
showing three versions of an output graph file and what led to their
Computational Context Viewer
This application allows you to answer questions such as, "What
effects did changes in my source code files and execution parameters
have on my experiment's output files?"
You launch this application with a target output file (e.g., a graph
generated by a script) as its argument. The GUI displays all versions
of the chosen file in reverse chronological order, along with what led
to the creation of each version.
For example, the screenshot on the left shows three versions of an
output graph file in reverse chronological order (right column). It
also shows the command-line parameters of the executions that produced
each graph version (middle column). Finally, it shows diffs in the
source code files that, when executed, caused the changes between each
graph version (left column). The first row's diff is the code
responsible for highlighting the three center bars in yellow, and the
second row's diff is the code responsible for turning the output file
from a line graph into a bar graph.
Lab Notebook Generator
This application generates a customizable HTML report that summarizes
your activities over a given time period. You can use these reports as
the basis for writing papers, tutorials, and theses.
Burrito is an early research prototype that is no longer being
developed. You can download its source code from GitHub.
However, I haven't had time to polish up the installation instructions,
so this code is mostly useful as a reference. In particular, it is a
good example use case for SystemTap.
Learn more by reading our workshop paper:
Burrito: Wrapping Your Lab Notebook in Computational Infrastructure.
Philip J. Guo and Margo Seltzer.
USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2012.
Also, Chapter 7 of my Ph.D.
dissertation contains a more detailed description.