Table Of Contents

Previous topic

CDE options

Next topic

Example use cases

This Page

Deep file copying (okapi)

The main job of the cde binary is to observe target program execution and copy all accessed files into the package sub-directory (cde-package/cde-root/). This job seems straightforward at first glance; after all, what’s easier than copying a file?

However, what makes this job difficult in practice is that CDE needs to faithfully replicate the exact directory and symlink structure within the package, or else some important programs (e.g., Java) will refuse to run from within the package. On Linux filesystems, any component of any given file path can be a symlink, and each symlink can be either an absolute or a relative link to a file, directory, or even another symlink! This flexibility leads to many bizarre corner cases when packaging up real-world programs.

I’ve developed a small C library named okapi (pronounced “oh-copy”) to encapsulate all of the intricate deep file copying functionality that CDE requires.

Motivating example

I’ll now illustrate the power of okapi with a real example I encountered when debugging CDE: Let’s say that I simply want to make a copy of the Java binary (/usr/bin/java) into a java-test/ sub-directory while preserving its original directory structure. These two simple commands do the job:

mkdir -p java-test/usr/bin/
cp /usr/bin/java java-test/usr/bin/

The resulting java-test/ sub-directory now contains the following contents:

_images/dir-java-simple53.png

In the above diagram, each box represents a directory, each circle represents a file, and each arrow shows a containment relation. Looks simple enough, right? There’s only a single file in there: java-test/usr/bin/java. So are we done? Not quite.

Here’s where things start to get complicated. Let’s take a look at the original file, /usr/bin/java:

$ file /usr/bin/java
/usr/bin/java: symbolic link to `/etc/alternatives/java'

We discover that it’s actually a symlink to an absolute path, /etc/alternatives/java. Let’s now peek at that file:

$ file /etc/alternatives/java
/etc/alternatives/java: symbolic link to `/usr/lib/jvm/jre-1.6.0-openjdk/bin/java'

So that file is also a symlink and points to /usr/lib/jvm/jre-1.6.0-openjdk/bin/java. Ok, let’s now peek at that file:

$ file /usr/lib/jvm/jre-1.6.0-openjdk/bin/java
/usr/lib/jvm/jre-1.6.0-openjdk/bin/java: ELF 32-bit LSB executable,
Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

Ok good, this looks like the actual Java binary. But there’s one more peculiarity that’s not apparent at first glance. Let’s look at the contents of /usr/lib/jvm/jre-1.6.0-openjdk, which appears like an ordinary parent directory of the Java binary:

$ file /usr/lib/jvm/jre-1.6.0-openjdk
/usr/lib/jvm/jre-1.6.0-openjdk: symbolic link to `java-1.6.0-openjdk-1.6.0.0/jre'

What the heck??? This is a symlink to another directory!

In order for Java to properly run from within a CDE package, CDE must faithfully reproduce all of the aforementioned complexities in the directory and symlink structure within the package. Simply copying /usr/bin/java by itself is not enough!

I’ve designed the okapi library to do this type of deep file copying that CDE requires. Here is how okapi handles a seemingly-simple request to copy /usr/bin/java into java-test/:

  1. okapi first executes the equivalent of mkdir -p java-test/usr/bin/
  2. okapi sees that /usr/bin/java is actually a symlink to /etc/alternatives/java, so it creates a similar symlink named java-test/usr/bin/java, which points to ../../etc/alternatives/java. Notice that the java symlink within java-test/ is actually a relative path that points to the version of /etc/alternatives/java inside of java-test/, as denoted by the leading ../../. If okapi had simply copied the original symlink, that would lead to incorrect behavior since it would reference the actual /etc/alternatives/java, not the version inside of java-test/.
  3. okapi executes the equivalent of mkdir -p java-test/etc/alternatives/ to pave the way to copying /etc/alternatives/java.
  4. okapi sees that /etc/alternatives/java is itself a symlink, so it creates a similar symlink named java-test/etc/alternatives/java, which points to ../../usr/lib/jvm/jre-1.6.0-openjdk/bin/java (again notice the relative path).
  5. okapi executes the equivalent of mkdir -p java-test/usr/lib/jvm/jre-1.6.0-openjdk/bin/, but this time there’s a twist. Recall that /usr/lib/jvm/jre-1.6.0-openjdk was itself a symlink to a directory, so okapi must replicate that symlink/directory structure within java-test/, creating another sub-directory named java-1.6.0-openjdk-1.6.0.0.
  6. Finally okapi copies /usr/lib/jvm/jre-1.6.0-openjdk/bin/java into java-test/usr/lib/jvm/jre-1.6.0-openjdk/bin/java!

Wow, that was a doozie, all just to copy a single Java binary file! However, CDE requires this amount of attention to detail in order for packaged programs to be able to run from within the package. A typical CDE packaging run calls hundreds or thousands of these deep copying operations.

For the visually-inclined, here is what the directory structure within java-test/ looks like after all the copying has completed:

_images/dir-java-small62.png

Once again, each box represents a directory, each circle represents a file, and each solid arrow shows a containment relation. However, now there are diamond shapes to represent symlinks, and dashed arrows to point to each symlink’s respective target. Notice the intricate web of files, sub-directories, and symlinks that okapi had to copy into java-test/ just to satisfy the request to copy /usr/bin/java into there. Fortunately, okapi hides all of this complexity from the user and presents an interface that’s as simple as the ordinary Linux copy (cp) program!

okapi stand-alone executable

When you compile the CDE source code, your machine also compiles a stand-alone executable named okapi. You can also run make okapi to just compile okapi.

okapi takes exactly 3 command-line arguments:

  1. The absolute path to the file that you want to copy, abspath
  2. The source root directory (can be an empty string), src_root
  3. The destination root directory, dst_root

It then performs a deep copy of $src_root/$abspath into $dst_root/$abspath, creating all intermediate sub-directories and symlinks.

For example, to deep-copy a file named /home/alice/experiments/data/tokyo.dat into /home/bob/experiments/data/tokyo.dat, run:

okapi /experiments/data/tokyo.dat /home/alice /home/bob

To my knowledge, no other Linux file copying tool (e.g., cp, rsync) can do the deep copying and symlink munging acrobatics that okapi does.

[One caveat is that okapi uses hard links to make copies when possible, for improved performance. Thus, the two copies might actually refer to the same physical file.]

okapi-ing an entire directory

If you want to copy (okapi!) an entire directory into another one, then use the CDE/scripts/okapi_dir.py script from the CDE GitHub repository. This Python script takes 2 command-line arguments:

  1. directory to copy, src_dir
  2. The destination root directory, dst_root

Invoking this script will cause it to traverse inside of $src_dir and okapi all constituent files, symlinks, and sub-directories into $dst_root (making sure to also follow symlinks to sub-directories outside of $src_dir). Think of this as cp -aR on steroids.

okapi source code

The okapi source code can be found within the CDE GitHub repository in okapi.c and okapi.h. Its only dependency is libc, so it should be easy to include in your own projects. Enjoy!

Btw, here’s a picture of a real okapi in the wild:

_images/okapi-flickr55.jpg