Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Helping my students overcome command-line bullshittery

Summary
One of my highest-leverage activities when working with students on research is to help them install, set up, and configure software while overcoming the bullshittery of command-line interfaces.

To produce publications in an applied computer science field such as my own, a researcher must either:

  • Write a piece of prototype computer software that demonstrates the feasibility of a novel idea.

  • Write a piece of prototype computer software that collects, processes, and analyzes data to produce novel insights about some topic.

Many projects actually involve both kinds of activities. Regardless of subfield, all applied computer science projects require some form of computer programming (i.e., writing code). All other skills being equal, the researchers who are better, faster, and more adept at programming will produce more (and I would argue, better) publications.

What is wonderful about doing applied computer science research in the modern era is that there are thousands of pieces of free software and other computer-based tools that researchers can leverage to create their research software. With the right set of tools, one can be 10x or even 100x more productive than peers who don't know how to set up those tools.

But this power comes at a great cost: It takes a tremendous amount of command-line bullshittery to install, set up, and configure all of this wonderful free software. What I mean by command-line bullshittery is dealing with all of the arcane, obscure, strange bullshit of the command-line paradigm that most of these free tools are built upon (see The Two Cultures of Computing for gory details). So perhaps what is more important to a researcher than programming ability is adeptness at dealing with command-line bullshittery, since that enables one to become 10x or even 100x more productive than peers by finding, installing, configuring, customizing, and remixing the appropriate pieces of free software.

I've spent the past decade mostly leading my own research projects. This meant that I did the majority of the command-line bullshittery and programming to produce the results that led to publications, especially ones where I was the first author. In short, I've gotten very, very, very good at command-line bullshittery. However, I'm now transitioning into the role of an advisor whose job is to mentor students on their research projects. This means that my students (not me) are now doing the programming required to produce my research group's publications.

The wall of command-line bullshittery

Here is a common productivity bottleneck faced by students working on applied computer science research:

  1. Advisor and student discuss high-level research ideas by doodling on the whiteboard. Awesomeness ensues.

  2. Student leaves advisor's office feeling pumped and knowing exactly what they need to do to implement those ideas in code.

  3. Student tries to get started on programming but immediately gets stuck since they don't know how to handle all of the command-line bullshittery required to set up their coding environment with the proper libraries, tools, and frameworks.

Many students get discouraged and turned off from research when they hit the wall in step 3.

There is a huge disconnect between the elegant high-level ideas discussed on the whiteboard (while presumably sipping cappuccinos) and the grimy, grungy, terrible command-line bullshittery required to set up a computing environment suitable for implementing those ideas in code. This gulf of execution is tremendously frustrating for highly-capable and motivated students who just didn't happen to spend 10,000 hours of their youth wrestling with nasty command-line interfaces.

Overcoming command-line bullshittery

As an advisor, I've found that one of the highest-leverage activities that I do with my students is guiding them through the intricacies of command-line bullshittery. There is simply no substitute for sitting down with them one-on-one on their laptop and walking them through all of the arcane commands to type, what they each mean, and how to interpret the bullshit output that's barfed out to the drab terminal. (Senior professors usually have postdocs, research scientists, or older students train the new students. But I just started this job, so it's only me right now!)

Throughout this entire ordeal where I'm uttering ridiculous epithets like “git pipe fork pipe stdout pipe stderr apt-get revert rollback pipe pipe grep pipe to less make install redirect rm rm ls ls -l ls tar -zxvf rm rm rm ssh mv ssh curl wget pip,” I keep reassuring my students that this bullshit is not intellectually interesting in any way ... it's all just a necessary upfront tax required to enable them to do the actual interesting research. I've engaged in so much command-line bullshittery over the years that I can confidently assert how uninteresting it all is. It's simply an obstacle to overcome before one can get real work done.

Fortunately, I find that once I go through the initial setup work with students and have them take notes by copying and pasting commands into text files, my students can hit the ground running with the actual programming tasks. Then we get into a nice weekly iteration cycle where they show me the progress on their software prototype, we brainstorm more ideas on the whiteboard, they go off and implement it in software, and repeat. Since we've installed and configured a good set of tools, my students can be a lot more productive than their peers who don't know about those tools, thus giving them a motivational boost as well. But the hardest part is just getting the initial coding environment properly set up so that they can get started on real work.

Incidental versus intrinsic complexity

On a more general note, helping my students overcome command-line bullshittery is one specific instance of my more general philosophy on research advising: I strive to remove incidental complexity for my students, so that they can focus on the intrinsic complexity of their research.

Command-line bullshittery is a prime example of incidental complexity: It has nothing to do with the intellectual content of my students' research. It arises simply because modern research software development is a messy jumble of open-source tools tied together by the duct tape of command-line scripts.

However, I don't think I can remove the intrinsic complexity of doing research for my students – the uncertainty of whether a data analysis is producing sensible results, the details of debugging a sophisticated algorithm, the challenges of technical writing, or the sting of repeated paper rejections. Nor would I want to, since those kinds of difficulties are integral parts of each student's journey to become a creative, tenacious, and independent researcher.

But no matter what anyone tries to tells you, setting up command-line bullshittery has nothing to do with one's intellectual worth.

Postscript: Addressing Internet Commenters

This article received some attention from Internet commenters on certain websites. So far I've suppressed my urge to respond on those sites, but there's one point that I can't let slide. There's no way to respond to Internet commenters without sounding defensive, so consider yourself warned!

Two small points of clarification, then onto the main event:

  • I didn't state it clearly, but I'm actually teaching my students how to use command-line tools during those one-on-one tutoring sessions. I'm not just setting up their environment for them. Despite the ranty tone of the article, I'm pretty good at teaching command-line-fu to students. My students quickly become self-sufficient and able to learn more on their own.

  • I think that command-line-based open-source software is incredibly powerful, although it does take time and effort to set up. I extolled the 10x to 100x productivity advantages of good tooling several times throughout the article. And nowhere did I suggest that we rely only on GUI-based tools.

OK here's what gets me super pissed. Many commenters presumed that “real programmers” should be command-line experts ... POSIX-flavored command-line experts, to be precise. (None of that cmd.exe bullshit.) I somewhat agree that good mid-career programmers are often command-line experts. However, that's not the population I work with in my job. I work with undergraduate and graduate students at a university. They are still learning programming and other technical skills, and I'm confident that prior POSIX command-line experience is not a prerequisite for being a good computer science major at either the undergraduate or graduate levels.

If I worked on research only with so-called “hardcore” students who are already POSIX command-line ninjas, as some Internet commenters suggested, then I will be stupidly missing out on students with great potential but who simply did not grow up exposed to Unix culture. Even worse, I will be exhibiting biases against women and minorities, who are much less likely to have childhood exposure to POSIX command-line culture due to a lack of role models coming from that culture. This isn't just politically-correct feel-good bullshit ... I want to work with the best students I can find, so it's profoundly stupid to disproportionately filter out entire demographics based on bogus criteria such as prior familiarity with incantations like “nohup tar -jxvf giant.tar.bz2 2> cmd.errs &

More generally, this notion that the only “real programmers” are those who have already mastered POSIX command-line-fu before they leave the university is a dangerous one, and contributes to the continued monoculture in software-based industries. It's also deeply insulting to the students whom I teach and work with on research, especially those who did not happen to grow up with childhood exposure to Unix-style hobby computing environments.

When I teach my students the principles and details of working on the command line, they get it. It doesn't matter that they weren't speaking zsh in tongues ever since they were ten years old. It's simply another set of tools to learn in the process of working on their research project. They learn a bit at a time as they work, and after a few months, they eventually start becoming proficient. Mission accomplished!


Update on 2015-10-20: Check out this rebuttal by Eytan Adar!

Created: 2014-10-08
Last modified: 2014-10-08
Related pages tagged as software:
Related pages tagged as assistant professor life: