The main job of the cde binary is to observe target program execution and copy all accessed files into the package sub-directory (cde-package/cde-root/). This job seems straightforward at first glance; after all, what’s easier than copying a file?
However, what makes this job difficult in practice is that CDE needs to faithfully replicate the exact directory and symlink structure within the package, or else some important programs (e.g., Java) will refuse to run from within the package. On Linux filesystems, any component of any given file path can be a symlink, and each symlink can be either an absolute or a relative link to a file, directory, or even another symlink! This flexibility leads to many bizarre corner cases when packaging up real-world programs.
I’ve developed a small C library named okapi (pronounced “oh-copy”) to encapsulate all of the intricate deep file copying functionality that CDE requires.
I’ll now illustrate the power of okapi with a real example I encountered when debugging CDE: Let’s say that I simply want to make a copy of the Java binary (/usr/bin/java) into a java-test/ sub-directory while preserving its original directory structure. These two simple commands do the job:
mkdir -p java-test/usr/bin/ cp /usr/bin/java java-test/usr/bin/
The resulting java-test/ sub-directory now contains the following contents:
In the above diagram, each box represents a directory, each circle represents a file, and each arrow shows a containment relation. Looks simple enough, right? There’s only a single file in there: java-test/usr/bin/java. So are we done? Not quite.
Here’s where things start to get complicated. Let’s take a look at the original file, /usr/bin/java:
$ file /usr/bin/java /usr/bin/java: symbolic link to `/etc/alternatives/java'
We discover that it’s actually a symlink to an absolute path, /etc/alternatives/java. Let’s now peek at that file:
$ file /etc/alternatives/java /etc/alternatives/java: symbolic link to `/usr/lib/jvm/jre-1.6.0-openjdk/bin/java'
So that file is also a symlink and points to /usr/lib/jvm/jre-1.6.0-openjdk/bin/java. Ok, let’s now peek at that file:
$ file /usr/lib/jvm/jre-1.6.0-openjdk/bin/java /usr/lib/jvm/jre-1.6.0-openjdk/bin/java: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped
Ok good, this looks like the actual Java binary. But there’s one more peculiarity that’s not apparent at first glance. Let’s look at the contents of /usr/lib/jvm/jre-1.6.0-openjdk, which appears like an ordinary parent directory of the Java binary:
$ file /usr/lib/jvm/jre-1.6.0-openjdk /usr/lib/jvm/jre-1.6.0-openjdk: symbolic link to `java-1.6.0-openjdk-184.108.40.206/jre'
What the heck??? This is a symlink to another directory!
In order for Java to properly run from within a CDE package, CDE must faithfully reproduce all of the aforementioned complexities in the directory and symlink structure within the package. Simply copying /usr/bin/java by itself is not enough!
I’ve designed the okapi library to do this type of deep file copying that CDE requires. Here is how okapi handles a seemingly-simple request to copy /usr/bin/java into java-test/:
Wow, that was a doozie, all just to copy a single Java binary file! However, CDE requires this amount of attention to detail in order for packaged programs to be able to run from within the package. A typical CDE packaging run calls hundreds or thousands of these deep copying operations.
For the visually-inclined, here is what the directory structure within java-test/ looks like after all the copying has completed:
Once again, each box represents a directory, each circle represents a file, and each solid arrow shows a containment relation. However, now there are diamond shapes to represent symlinks, and dashed arrows to point to each symlink’s respective target. Notice the intricate web of files, sub-directories, and symlinks that okapi had to copy into java-test/ just to satisfy the request to copy /usr/bin/java into there. Fortunately, okapi hides all of this complexity from the user and presents an interface that’s as simple as the ordinary Linux copy (cp) program!
When you compile the CDE source code, your machine also compiles a stand-alone executable named okapi. You can also run make okapi to just compile okapi.
okapi takes exactly 3 command-line arguments:
It then performs a deep copy of $src_root/$abspath into $dst_root/$abspath, creating all intermediate sub-directories and symlinks.
For example, to deep-copy a file named /home/alice/experiments/data/tokyo.dat into /home/bob/experiments/data/tokyo.dat, run:
okapi /experiments/data/tokyo.dat /home/alice /home/bob
To my knowledge, no other Linux file copying tool (e.g., cp, rsync) can do the deep copying and symlink munging acrobatics that okapi does.
[One caveat is that okapi uses hard links to make copies when possible, for improved performance. Thus, the two copies might actually refer to the same physical file.]
If you want to copy (okapi!) an entire directory into another one, then use the CDE/scripts/okapi_dir.py script from the CDE GitHub repository. This Python script takes 2 command-line arguments:
Invoking this script will cause it to traverse inside of $src_dir and okapi all constituent files, symlinks, and sub-directories into $dst_root (making sure to also follow symlinks to sub-directories outside of $src_dir). Think of this as cp -aR on steroids.