Burrito: Rethinking the Electronic Lab Notebook

Computational researchers have trouble managing the numerous code and data files generated by their experiments, comparing the results of trials executed with different parameters, and keeping up-to-date notes on what they learned from past successes and failures.

Burrito is a Linux-based system that automates as much of this experiment organization and notetaking process as possible, thus freeing researchers to focus on their actual work.

Burrito automatically captures a researcher's computational activities with no perceptible run-time slowdown. It provides user interfaces to annotate the captured activities with notes and then make queries such as, "Which script versions and command-line parameters generated the output graph that this note refers to?"

Problem

The process of hacking on experimental code is messy:

  • You constantly adjust your code and run it with different parameters, which generates a ton of output data files.
  • Version control systems are too cumbersome for your rapidly-changing workflow. Instead, to keep track of what each file represents, you create weird filenames to encode metadata such as command-line parameters or version numbers (see screenshot on the right).
  • It's hard to assess how changes to your source code and execution parameters led to corresponding changes in output data files.
  • You read documentation web pages, PDF files, sample code, and other resources while you hack, so it's hard to remember which resources influenced you to make specific edits.
  • You try to be disciplined about keeping notes, but you often forget what exact code or data your notes refer to, since they change so rapidly.

Researchers' directories are filled with weirdly-named files representing variants of experiment outputs.

Burrito System Overview

Burrito solves the above problems by wrapping a layer of computational infrastructure around your normal Linux work environment. It consists of eight main components:

  1. A versioning filesystem that automatically tracks all edits to all of your files and allows you to view old file versions. This eliminates the need to use version control systems or weird file naming conventions.
  2. A tracer that records the origin (provenance) of files, telling you which program invocations created or read from each file, and what their parameters were.
  3. A tracer that records your GUI interactions, such as which application windows you were viewing at specific times.
  4. A set of plugins that record your activities within specific applications, such as which MATLAB commands you were running and which web pages you were visiting.
  5. A real-time Activity Feed that allows you to view and take notes on your recent activities.
  6. An Activity Context Viewer that displays what else you were reading and writing when hacking on some part of your code.
  7. A Computational Context Viewer that shows how changes to your source code and execution parameters affected your experiment's output files.
  8. A Lab Notebook Generator that creates an HTML summary of your activities over a given time period.

Activity Feed

This application is a sidebar on the left portion of your Linux desktop background. It displays a real-time stream of your actions and allows you to annotate any action with notes. Since all notes are linked with their original context, you no longer need to worry about creating, organizing, and locating a mess of notes files.

As the screenshot on the right shows, the Activity Feed's UI is inspired by the Facebook News Feed: New events appear at the top of the feed and push down older ones. It currently displays six types of events:

  1. A Bash command event shows a group of Bash shell commands executed in the same directory. You can click on any command to copy it to the clipboard and paste it into a terminal to re-execute.
  2. A website visit event shows a set of web pages that you just viewed. You can click on any page title to open its link in a web browser.
  3. A file modification event shows a group of files modified by a particular process. For example, saving a source code file in a text editor will create a new file modification event, as does executing a script to generate an output data file.
  4. A digital sketch event shows a thumbnail view of a sketch that you've just drawn using, say, a Wacom pen tablet.
  5. You can create a status update event by entering text in the status text box and pressing the "Post" button. This is the main way for you to describe what you're working on at a given moment, which helps place other events in context.
  6. You can create a checkpoint event by clicking on either the "Happy Face" or "Sad Face" button and then entering a note in the pop-up text box describing why you're happy or sad about the current state of your experiment. A "happy checkpoint" is like making a commit in a version control system, and a "sad checkpoint" is like filing a bug report in a bug tracking system.

Screenshot showing the Activity Feed on the left, and a visual diff of two file versions on the right.

You can click on a file modification event in the feed and select the following actions:

  • Open the version of the chosen file either before or after that modification.
  • Diff multiple old versions of a chosen file by launching the Meld visual diff tool (see screenshot on the left).
  • Revert the file to the chosen version.
  • Watch the file for changes. The Activity Feed will report a warning if a future modification causes that file to differ from the chosen version. This action creates a simple regression test.
  • View the context surrounding edits to the chosen file (see below).

Activity Context Viewer

This application enables you to answer questions such as, "When I left work last week, I was editing this part of my script and had a collection of reference materials open ... what were they?"

You launch this application with a target text file (e.g., source code) as its argument. The GUI is a table view where each row represents a "version" of the chosen file (determined based on heuristics), displaying these four columns of data:

  • Diffs of this file between the previous and current versions.
  • Resources read while working on this version, including which web pages, documents, and source code you viewed.
  • Resources written while working on this version, including other code that you edited, and checkpoints, status updates, and digital sketches that you created.
  • Annotations that you can add to this file version.

Screenshot of the Computational Context Viewer GUI showing three versions of an output graph file and what led to their creation.

Computational Context Viewer

This application allows you to answer questions such as, "What effects did changes in my source code files and execution parameters have on my experiment's output files?"

You launch this application with a target output file (e.g., a graph generated by a script) as its argument. The GUI displays all versions of the chosen file in reverse chronological order, along with what led to the creation of each version.

For example, the screenshot on the left shows three versions of an output graph file in reverse chronological order (right column). It also shows the command-line parameters of the executions that produced each graph version (middle column). Finally, it shows diffs in the source code files that, when executed, caused the changes between each graph version (left column). The first row's diff is the code responsible for highlighting the three center bars in yellow, and the second row's diff is the code responsible for turning the output file from a line graph into a bar graph.

Lab Notebook Generator

This application generates a customizable HTML report that summarizes your activities over a given time period. You can use these reports as the basis for writing papers, tutorials, and theses.

Burrito is an early research prototype that is no longer being developed. You can download its source code from GitHub. However, I haven't had time to polish up the installation instructions, so this code is mostly useful as a reference. In particular, it is a good example use case for SystemTap.

Learn more by reading our workshop paper:

Burrito: Wrapping Your Lab Notebook in Computational Infrastructure. Philip J. Guo and Margo Seltzer. USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2012.
[BibTeX]

Also, Chapter 7 of my Ph.D. dissertation contains a more detailed description.

If you want to play around with Burrito without installing pesky dependencies, you can download the Burrito 32-bit Fedora demo VM image (1.9 GB) and run it with VirtualBox. The VM username is researcher and the password is passwd. Once you log in, read README.txt on the desktop for further instructions. However, this VM is not meant to be used as a production system; again, the main contributions of this project are its novel ideas, not the actual implementation.