Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

An example of command-line bullshittery in computer science research

Summary
I present an example of command-line bullshittery that computer science researchers face. This sort of bullshittery can be much more difficult and aggravating than the actual programming involved in a project.

I just finished my first year as an assistant professor. I spend my days sipping cappuccinos, sketching ideas in my notebook while sunbathing on the university quad, and pontificating about the beauty of computing to students while classical music inexplicably plays in the background. No, actually I'm often hunched over a laptop fighting with command-line bullshittery – setting up idiosyncratic computing environments so that my students can make progress on research projects without getting hung up on incidental complexity with no relevant intellectual value.

One obvious question is whether this is the best use of my time. Why not just throw students into the lion's den and have them figure it out on their own? Isn't my time better spent elsewhere? Well, if I had a more established research lab, then my senior students and postdocs can do this grungy setup work for new lab members. But since this is my first year on the job, all of my students are brand-new, and most are undergrads with little prior experience in the fine art of command-line bullshittery. So the buck stops with me.

In my experience so far, a day or two of me setting up the computing environment for a project pays amazing dividends in the long run as students are able to make steady progress on the actual programming parts rather than getting demoralized by irrelevant poo poo. In general, novices are much better at programming than at dealing with command-line bullshittery. If I drop them into a computing environment where they can code, execute, and debug, then a project can really get rolling. But if I start them off with a blank slate, then it's really hard for them to overcome that initial setup hurdle alone.

Today's https adventure

Here is a typical example of command-line bullshittery, which I just spent today fighting. When reading this story, keep in mind how idiosyncratic my setup is, and how much of it is a consequence of legacy code and dependencies. Each new research project likely involves a similar level of weirdness with dependencies that span multiple machines and external services, so it's impossible to cleanly “package up” a uniform computing environment for my lab in, say, a Docker container.

One of my grants is funding a research project that requires me to integrate my Online Python Tutor tool with Microsoft Office Mix to prototype new authoring interfaces for programming lessons.

I've been running Online Python Tutor (which I'll abbreviate as OPT) on WebFaction for the past four years. Six months ago, I moved the OPT backend to execute in a sandbox on a separate server, hosted on Linode, but I still wanted to keep the frontend (website) on WebFaction because I didn't want to risk any breakage (it's a high-traffic site and prominent domain name). Now the app exists on two separate servers from two different vendors, which each have different sysadmin requirements.

When a user visits the OPT site and executes code, the WebFaction-hosted frontend makes a JSONP call over http to my Linode-hosted backend. OK so far so good. But ... Microsoft Office Mix (which I'll call Mix) requires apps that integrate with it to use https, not http. Ergh!

When I first learned about this https requirement for Mix last year, I scrambled to get https working on WebFaction since I wanted to show a proof-of-concept to the Microsoft folks to get the grant in the first place. (I didn't use Linode yet back then.) I cobbled up a quick hack using StartSSL for a one-year free certificate and installed it on WebFaction. I ended up getting the grant, and the project could really get started.

OK, pop back to the present. Since I now have a two-server setup, I need to do two things to get https working on both:

  • Get another StartSSL certificate for WebFaction since my old one-year certificate was about to expire. Since I did this setup only once last year, I had no “muscle memory” of how to do it again. So again I cobbled together some hack to make it work. Fortunately I took notes last year as I started working with StartSSL and WebFaction, but back then I was in such a rush to get results in time for the grant submission that my notes were super spotty.

  • Get a brand-new StartSSL certificate for Linode. I need https for Linode in addition to WebFaction since an https-served page (e.g., the OPT frontend) can make requests only to other https servers (e.g., the OPT backend) for security reasons.

The second task (Linode+StartSSL) ended up being much harder since WebFaction has a customer service department that installs the certificate for users if you put the certificate and key files in the right place, but with Linode, I was completely on my own.

To make matters worse, before I could even get a certificate for Linode, I first needed to assign a domain name to my server. Wait, why didn't it already have a domain name? Because I was just using that server as a backend for OPT, so only my code connected to it by hardcoding its IP address into the JSONP requests. Humans never directly interacted with that server, so there was no need for a user-friendly domain name. (I think you can actually request certificates for raw IP addresses, but doing it for domain names is more robust because the certificate will still be valid even if I switch servers.)

Fortunately I have a weird habit of registering domain names on a whim, so I logged into my registrar's website (Namecheap) and found that I had a spare domain that was perfect for this purpose. I followed two separate sets of directions from Linode and Namecheap to assign a domain name to my Linode server. Note that the documentation on both ends are incomplete by design: Linode has no idea that I'm using Namecheap, and Namecheap has no idea that I'm using Linode. So I had to use my prior experience in command-line-Rosetta-stone-bullshittery to “translate” between the subtly differing vocabulary used in their respective documentation.

OK, now that I finally have a domain name, I can request a StartSSL certificate for it. After over an hour distracted by a side goal, I can finally get back to my main goal of Linode+StartSSL ... which itself is a side goal for my real main goal: to get some research done!

To further add to my confusion throughout this adventure, I wasn't clear whether I could use StartSSL to request more than one free certificate, so just to be safe, I registered two different accounts using two different email addresses of mine. I had to meticulously document which email address was associated with each account; otherwise I'd get hella confused later on. (Again, I'm not an expert in StartSSL or certificates since I'm not a professional web sysadmin. I do this only once a year at most!)

To pile on the bullshittery even more, some steps require waiting for up to an hour for, say, DNS to propagate or for StartSSL employees to manually approve requests. I was super afraid of making a mistake, since it would cost me lots of time.

OK so after I got my StartSSL certificate installed on Linode, I set up a new https server using Node.js. But I can't seem to access it from my web browser! I don't see any error messages whatsoever. Erghhhhh what's wrong? Is my certificate not installed properly? Are my permission bits not set right? Is my server not visible to the outside world? Ahhhhh! I was so close, yet so far.

Then I suddenly remembered that when I first set up my Linode six months ago, I followed their “best practices” getting started guide and set really freaking strict firewall rules to maximize security. I dug through my old Linode setup notes and saw the commands I used to configure and activate the firewall. I tweaked my firewall rules to open some more ports for my new https server, and BAM I was finally able to see it from my web browser. Success!

I couldn't have predicted that I would almost get destroyed by the firewall issue in the end. If I hadn't thought about the firewall, then I would've truly been stumped. Because nothing about my task at hand had anything to do with firewalls. I set up the firewall once around six months ago when I first bought my Linode and then just forgot about it. Everything ran fine ... until I tried adding https. Without me taking a deep breath to step back and think holistically about the problem, the firewall wouldn't have crossed my mind.

Was it Linode's fault for recommending such insanely strict firewall rules in their getting started guide? No, because it's in their best interest to make sure their users' servers are secure. They had no idea that I'd be trying to set up some https server with Node.js and hand-installed certificates from StartSSL. How could anyone predict such a specific scenario? Who would even ask a question about it on, say, StackOverflow? This is the perfect example of command-line bullshittery, which often involves a mash-up of components, services, and software from different vendors with no knowledge of one another.

What does this have to do with research?

OK if you've read this far, you might be thinking, “What the heck does any of this drudgery have to do with research?!?” The answer is that it has absolutely nothing to do with research. Absolutely nothing. It's just a necessary setup step so that my students and I can get down to working on real research. None of this arcane sysadmin knowledge has any bearing on our merit as researchers. But without it, no research can actually get done.

Not only did this bullshittery have nothing to do with research, it didn't even involve any programming! I didn't write a single line of code. I just clicked around on a bunch of web forms, searched for help in tons of different pieces of documentation, and ran a bunch of command-line commands on servers.

Now one could ask, “Why didn't you design the whole system from the ground up to be perfectly architected in the first place instead of cobbling pieces together on-demand?” Hmmm, the only way to do so would be to predict the future. And if I could do that, then I'd be rich! Research software inevitably evolves as projects take unexpected turns. Nothing is ever done on a blank slate anymore; every piece of software builds upon prior components, each with its own quirks and external dependencies. In this case, the fortunate opportunity to get this source of grant funding added a requirement for working with Mix, which requires https due to a prior design decision that was completely out of my hands. I have similar war stories for pretty much every published research project from my career thus far.

Without first overcoming command-line bullshittery to set up the quirky computing environments necessary to create novel pieces of software that push beyond the state of the art in a research area, even the best ideas remain just that – ideas.

Created: 2015-06-17
Last modified: 2015-06-17
Related pages tagged as software:
Related pages tagged as research:
Related pages tagged as assistant professor life: