Here is a small but representative example of using CDE to migrate a Python computational experiment between two Linux machines.
Let’s say Alice is a climate scientist who is running weather simulations for her research. Her experiment consists of a Python script (weather_sim.py) and a data file representing Tokyo weather data (tokyo.dat) located in /home/alice/cool-experiment/. She normally runs the experiment by typing the following shell command:
python weather_sim.py tokyo.dat
The shell finds the python executable within /usr/bin/ and invokes it with weather_sim.py and tokyo.dat as its arguments.
The above diagram shows all the files involved in running this command: First, the python executable (underlined in pink) loads the standard C library (libc-2.10.so) and the weather_sim.py script file. Then weather_sim.py loads the tokyo.dat data file and the py-weather.so library, which contains optimized weather simulation sub-routines.
Note that py-weather.so is an example of a 3rd-party Python extension library that does not come pre-installed on her computer. Prior to running her experiments, Alice had to install this library and configure her version of Python to be able to find and use it.
Now let’s say that Alice’s colleague Bob wants to reproduce her weather simulation experiment and modify it to test some related hypotheses. Bob simply asks Alice to email her entire cool-experiment/ directory to him. He unpacks the directory on his computer, changes into it, and then tries to run her script in the same way that she originally did:
python weather_sim.py tokyo.dat
Bob thinks he should have no problems running Alice’s script, since Python came pre-installed on his Linux computer. However, when he tries to run her script, it crashes with an error because the py-weather.so library cannot be found (see the diagram below). He must now go through the trouble of installing py-weather.so and configuring his computer’s Python to be able to find and use it.
This example is actually oversimplified. In real life, Bob might have to install and configure several libraries, which themselves might depend on more libraries or conflict with those already installed on his computer. It could take him hours or days of frustration before he sets up the proper dependencies to run Alice’s script, and he could inadvertently break other programs on his computer in the process (e.g., due to conflicting library versions). Let’s now see how CDE can eliminate these frustrations.
After Alice downloads CDE to her computer, she can create a self-contained package for her experiment by simply prepending its original command with cde:
cde python weather_sim.py tokyo.dat
CDE executes her script and monitors all the files that it accesses. It creates a cde-package/ sub-directory and copies all of those files into there, mirroring the original directory structure:
Note that the cde-package/ sub-directory (dotted pink box) contains all the files required to run her script on another computer (e.g., her versions of the standard C library, the Python interpreter, and the custom py-weather.so extension library).
Now Alice can transfer her entire cde-package/ directory to Bob (e.g., via email). Bob can now run Alice’s script by changing into the cool-experiment/ sub-directory (within the package) and running the special python.cde program with the same arguments as Alice’s original command:
./python.cde weather_sim.py tokyo.dat
The python.cde wrapper program first creates a “sandbox” within the package (the dotted pink box in the diagram below) and then invokes Alice’s version of Python (underlined in pink). Alice’s Python knows how to find the py-weather.so library, so her script runs properly, just like how it ran on her own computer.
Note that all arrows in this diagram remain within the dotted pink box (sandbox). CDE ensures that commands under its supervision can only access files within the sandbox, so they cannot interfere with the rest of Bob’s computer. Even though Bob has Python and the standard C library (libc-2.6.so) installed on his computer, CDE uses the versions from Alice’s package. In essence, CDE allows Bob to transfer a “slice” of Alice’s computer into his, so that he can safely run and modify her scripts.
In addition to reproducing Alice’s script run, Bob can also modify weather_sim.py to explore alternative hypotheses, test on other datasets, or write new scripts that build off of it.
Of course, CDE is not limited to Python; it works on arbitrary Linux programs. In short, if Alice can run a command on her computer, then CDE enables her colleagues to run that command on theirs.