Year Three: Relapse
The Ph.D. Grind
As I began my third year of Ph.D. in the middle of 2008, I rejoined the Klee project, again as the most junior student. At that time, the only two remaining people on the Klee team were its original co-creators: Dawson and Cristi (his most senior Ph.D. student). All other Klee team members had already left the project.
I had mixed feelings about returning. On one hand, my traumatic first-year experiences with Klee made me dread both the project and also the team dynamics. On the other hand, Cristi and Dawson were both very passionate about Klee and wanted to publish additional follow-up papers. Since they were veteran insiders in the software bug-finding subfield, I felt like I had a strong chance of publishing papers together with them. The alternative would have been to continue the empirical software measurement project from my second year. Although I was much more interested in that project, I knew that it would be a continual struggle to publish in a subfield where Dawson and I were both outsiders. And since my goal was to publish and earn a Ph.D., I tucked away my ego and took the plunge into Klee again. I wrote Dawson an email announcing, “my main plan is to team up with you and cristi on klee to do something solid and hopefully make some [paper] submission in a few months. i think that leveraging klee and aligning with both of your interests and incentives will be the best way for me to both make a contribution and also to feel satisfied about making concrete forward progress every day.”
Cristi and Dawson wanted me to experiment with a new way of running Klee called cross-checking, which allowed it to find inconsistencies between two different versions of similar software. For the next four months (July to October 2008), my day-to-day grind was similar to my first-year Klee assignment with Linux device drivers, except that I now paced myself a lot better to remain healthy and avoid burnout. Just like during my first year, I was doing a lot of grungy manual labor to use Klee to discover new bugs in software rather than improving Klee in any substantive way. My daily workflow consisted of setting up dozens of Klee configuration options, launching Klee to run for approximately ten hours to cross-check a set of test software programs, coming back the next morning to collect, analyze, visualize, and interpret the results, making the appropriate adjustments to Klee's options, and then firing off another ten-hour round of experiments.
Like other sophisticated software tools, Klee had dozens of adjustable configuration options. And since it was a research prototype hacked together by students, the behaviors of most of those options were not clearly documented. As a result, I wasted a lot of time due to misunderstanding the subtle interactions between options as I was adjusting them. I filled my research lab notebook with curses such as: “OH SHIT, I think my mistake was in not realizing that there are certain options you're supposed to pass into Klee (e.g., -emit-all-errors) and others that you're supposed to pass into the target program to be used to set up the model environment (e.g., --sym-args), and if these are confused, then strange things happen because Klee is executing the target program with different argc and argv than you expect.”
Throughout those months, Cristi and Dawson sometimes talked about submitting a Klee cross-checking paper, so I was motivated by that seemingly-concrete goal. As I was working, I wrote up an outline for a paper submission and incrementally filled it in with my notes and results. However, to my surprise, neither Cristi nor Dawson showed much urgency in getting this paper polished and submitted. I was not yet capable of submitting a respectable paper on this topic without their expertise and assistance, since I was merely an assistant doing manual labor: The true research insights and high-level persuasive pitch still needed to come from them. In the end, we never submitted a paper, and my four months of work was again in vain, just like during my first-year Klee grind.
This project fizzled due to a combination of my own lack of technical expertise and insufficient mentorship from senior colleagues. Although Cristi was patient in advising me on cross-checking ideas and debugging Klee idiosyncrasies, his heart wasn't fully into our project. Since he was in the process of finishing up his Ph.D., his main priority at the time was applying for jobs as an assistant professor. The faculty job application process takes several grueling months of serious effort, and many applicants still end up with no offers. Each university department offers at most one or two tenure-track professor job positions per year, and over a hundred highly-qualified senior Ph.D. students, postdocs (temporary postdoctoral researchers), and research scientists fight for those coveted spots. The academic job hunt is a stressful process that consumes almost all of one's waking time and mental energy. Thus, Cristi had no incentive to spend hundreds of hours working on yet another paper submission, since even if it got accepted, the notification would come too late to matter for his job applications.
In hindsight, I can see why this project was likely to fail because of misaligned incentives, but back then, I lacked the wisdom to foresee such a failure. Recall that I decided to become a Klee assistant for Cristi and Dawson since I wanted to join an older Ph.D. student and professor who were experienced in publishing papers in their given subfield. I did so because this plan worked marvelously during the previous year when I helped Joel (an older Ph.D. student) and Scott (a professor) on their HCI project, which led to a top-tier award-nominated paper.
So what was different here? In short, neither Cristi nor Dawson were truly hungry to publish. They had already published several Klee papers together, and a cross-checking paper coauthored with me would have been a “nice-to-have” but not mandatory follow-up publication. Cristi was in his final year of Ph.D. and didn't need to publish any more papers to graduate, and Dawson already had tenure, so he wasn't in a rush to publish either. In contrast, Joel was a mid-stage Ph.D. student who was itching to publish the first paper of his dissertation, and Scott was an assistant professor who needed to publish prolifically to earn tenure. These two opposing experiences taught me the importance of deeply understanding the motivations and incentives of one's potential collaborators before working with them.
Since the cross-checking project went nowhere and Cristi was busy with his faculty job applications, I decided to take the lead on my own Klee-related project rather than continue serving as an assistant. After some discussions with Dawson, he suggested for me to try to improve a core component of Klee—its search algorithm. Klee finds bugs by searching through a “maze” of executable software code, so improving its search algorithm might enable it to find more bugs.
For the first time, I was modifying Klee in a novel way—improving its search algorithm—rather than simply doing manual labor to run Klee to find bugs in software. One way to measure how well I was doing was to compute the percent coverage that Klee achieves (i.e., how much of the code maze was “covered” by Klee's searching) on a set of test software programs. Dawson's goal was simple: to get significantly better coverage than the current search algorithm reported in the latest Klee paper. On a suite of 89 test programs, Klee already achieved an average of 91 percent coverage on each program (100 percent is perfect coverage). My job was to improve those coverage numbers as much as possible. Every day, I would modify Klee's search algorithm, run Klee on the 89 test programs (which would take approximately ten hours), come back the next morning to see the coverage numbers, and then make another round of modifications to Klee's code and rerun it on the test programs.
It was now the middle of my third year, and many of my fellow students and I fell into a state of “limbo” where it became difficult to motivate ourselves to consistently come into the office every single day. We also experienced isolation and loneliness from spending day and night grinding on obscure, ultra-specialized problems that few people around us understood or even cared about. Our advisors and senior colleagues sometimes provided high-level guidance, but they rarely sat down together with us to work out all of the grimy details.
Unlike our peers with regular nine-to-five jobs, there was no immediate pressure for grad students to produce anything tangible—no short-term deadlines to meet or middle managers to please. For most students in my department, nobody would notice or care if they took one day off, so by extension, why not take two days off, a whole week off, or even a whole month off? Therefore, it's unsurprising that many Ph.D. students who drop out do so around their third year.
To fend off procrastination, I worked tirelessly to impose self-discipline and structure on my workdays. I tried to “micromanage” myself by setting small, bite-sized goals and attacking them every day, hoping that positive results would eventually come. But it was hard to keep myself motivated when I didn't see noticeable daily progress.
Discipline alone wasn't enough; I failed to achieve any favorable results after three months of tuning Klee's search algorithms. Since Klee already achieved 91 percent average coverage on our test programs, it was excruciatingly difficult for me to improve those numbers by a few percent up to an average of, say, 94 percent. Even worse, these kinds of minor improvements simply don't look impressive in a paper submission. The one-line story of our paper would be something like: “We improved Klee's search algorithm in some ways to get its average coverage up from 91 to 94 percent.” This is hardly an exciting or even interesting result in the eyes of reviewers; it's a typical example of boring incremental improvements to an existing project. Unsurprisingly, Dawson wasn't interested in attempting to submit such a lame paper.
If I had improved Klee's search algorithm in a fascinating and effective way, then Dawson might have been more excited and worked harder to try to submit a paper. But in January 2009, after three months of futile grinding, I couldn't see how my day-to-day incremental efforts would ever result in a breakthrough that met Dawson's expectations. I hate being labeled as a quitter, but I felt like this Klee search algorithm project was a dead-end, so I quit.
Looking back now, I take perverse solace in one tragic fact: After I stopped working on Klee's search algorithm, two of Dawson's other Ph.D. students worked on this exact same problem, and neither has published a single paper in the past three years. I don't think I could have done any better than those two students, so if I had stayed the course on this particular project, then I might have also been stuck in a three-year-long limbo.
Despite repeated failures with Klee, I still wanted to keep working on it because that was the only project Dawson cared about. I was starting to hate Klee, but I had already sunk thousands of hours into wrestling with its code, so I wanted something concrete to show for my efforts. It was now the middle of my third year, and I was desperate to publish a first-author paper that could form the basis for my dissertation; I felt a bit behind since a few of my classmates had already published their first dissertation-worthy paper. I naively hoped that Klee would be the “path of least resistance” to earning my Ph.D., since it was perfectly aligned with my advisor's interests.
At this time, a first-year Ph.D. student named Peter joined Dawson's lab group and was looking for a project. I talked to Dawson about teaming up with Peter, figuring that the two of us working together might get better results than each working alone. Dawson liked the idea, so he suggested for Peter and me to reimplement underconstrained execution in Klee (abbreviated “Klee-UC”). Recall that Dawson and another student implemented the first version of Klee-UC during my first year. They created a rough first draft, submitted a shoddy paper hastily written in three days (a debacle I remember all too well), and then the project halted when that student dropped out of the Ph.D. program shortly thereafter. So now, two years later, it was up to Peter and me to reimplement Dawson's initial Klee-UC vision and hopefully get it working well enough to publish a paper.
I came into this new assignment with as much optimism as I could muster, trying my best to forget my past with Klee. I convinced myself that if I had any chance of publishing a Klee-related paper, it would be with this current Klee-UC project. I wholeheartedly believed that Dawson's Klee-UC idea was innovative and interesting from a research perspective, so if Peter and I could do a good enough job of implementing it and finding important software bugs, then we would have a strong paper submission. Moreover, I could reuse most of the experimental infrastructure I had set up for the Linux device drivers work from my first year, since we wanted to show how Klee-UC improves upon regular Klee in terms of finding bugs in those drivers. Finally, I fantasized about a successful Klee-UC paper being the ultimate redemption for all of those thousands of hours of manual labor I had spent on Klee. After all, it was my struggles with using Klee on Linux device drivers during my first year that directly inspired Dawson to come up with the Klee-UC idea. Thus, it would be a fitting conclusion if I were to first-author the paper that brought this idea to fruition (professors in my field usually let their students be the first author, even if the student's project was based on their ideas). I even hoped that this project would form the beginning of my dissertation and pave the way for my eventual graduation.
Over the next two months (February and March 2009), Peter and I busted our butts to build Klee-UC. We had a lot of fun programming together in the office every day; it was a welcome change from the solitary day-to-day grind that most Ph.D. students experience. However, after a while, Dawson seemed visibly disappointed with the relatively slow pace of our progress. Peter and I thought we were doing fine, but Dawson didn't seem happy with our work, so he no longer felt like aiming for an upcoming paper submission deadline.
At the time, I couldn't understand why Dawson was so impatient with us, but I can now sympathize with his feelings of frustration. He had such a crystal-clear vision for Klee-UC in his mind, and he wanted some talented and hardworking students to carry out his vision. If Dawson were still a Ph.D. student, then he would have surely been able to get Klee-UC done in a matter of weeks and then singlehandedly write up and publish a top-tier paper. His publishing track record when he was a student was beyond prolific, which is how he got a top-tier faculty job at Stanford. However, since he was now busy with professor duties such as teaching, committee work, paper reviewing, and other errands, he could not devote the thousands of hours of focused labor necessary to turn this idea into a publishable paper. Like all professors in labor-intensive research fields, Dawson needed students to execute on his visions.
I think that Dawson expected Peter and me to have gotten publishable results much faster, so to him, we either seemed incompetent or not serious enough about our jobs. As a professor at a top-tier university, it's a sad reality that all of Dawson's students are probably less competent than he was as a Ph.D. student. The explanation is simple: Only about 1 out of every 75 Ph.D. students from a top-tier university has what it takes to become a professor at a school like Stanford (or maybe 1 out of every 200 Ph.D. students from a regular university). Unsurprisingly, neither Peter nor I was of that caliber. If Dawson had worked with a younger clone of himself, then progress would have been a lot faster!
Even though we put in a solid effort during those two months, Peter and I felt like we had really let Dawson down on a project he cared deeply about. Peter was so discouraged that he switched advisors and then later dropped out of the Ph.D. program altogether. With my teammate gone, I grew more disillusioned and decided to quit Klee for the final time.
Two years after Peter and I left the Klee project, Dawson finally found a new Ph.D. student who could properly implement his Klee-UC vision to fruition. In 2011, Dawson and his new student published a great paper incorporating both Klee-UC and cross-checking ideas. In the end, it took three attempts by four Ph.D. students over the course of five years before Dawson's initial Klee-UC idea turned into a published paper. Of those four students, only one “survived”—I quit the Klee project, and two others quit the Ph.D. program altogether. From an individual student's perspective, the odds of success were low.
From a professor's perspective, though, Klee-UC was a rousing success! Since Dawson had tenure, his job was never in danger. In fact, one of the purposes of tenure is to allow professors to take risks by attempting bolder project ideas. However, the dark side of this privilege is that professors will often assign students to grind on risky projects with low success rates. And the students often can't refuse, since they are funded by their advisors' grants. Thankfully, since I was funded by fellowships, it was much easier for me to quit Klee.
I don't mean to single out Dawson or Klee in particular. This mismatch of incentives between tenured professors and Ph.D. students is a common problem in most labor-intensive science and engineering research projects. What often happens is that a professor starts with a pile of grant money and some high-level vision (e.g., Klee-UC or cross-checking). The professor then hires several students and advises them on implementing that vision, possibly (but not always) as part of their dissertation work. Without thousands of hours of student labor, there would be no tangible results and thus no publications.
The professor might need to go through several rounds of student failures and dropouts before one set of students eventually succeeds. Sometimes that might take two years, sometimes five years, or sometimes even ten years to achieve. Many projects last longer than individual Ph.D. student “lifetimes.” But as long as the original vision is realized and published, then the project is considered a success. The professor is happy, the university department is happy, the grant funding agency is happy, and the final surviving set of students is happy. But what about the student casualties along the way? A tenured professor can survive several years' worth of failures, but a Ph.D. student's fledgling career—and psychological health—will likely be ruined by such a chain of disappointments.
I attended Cristi's oral defense in May 2009, the end of my third year. The oral defense is the final rite of passage before a student earns their Ph.D. degree: The student gives a one-hour presentation on their dissertation research and must answer critical questions from a panel of professors. Dawson, who is normally quiet and reserved, beamed with visible pride as he introduced Cristi to the audience and raved about what a pleasure it was to have worked together over the past few years to create Klee. His praise was well-deserved: Cristi did a wonderful job throughout his Ph.D., and the ideas embodied by Klee helped create a brand-new subfield (called mixed concrete/symbolic program execution) within the software bug-finding research world.
As I watched Cristi's oral defense presentation, it finally sank in that it would be almost impossible for me to get a substantive dissertation out of Klee, so I felt more confident in my decision to quit. Cristi's phenomenal success made it more difficult for Dawson's younger students to publish and graduate. The groundbreaking initial Klee work had already been done; all that remained were follow-up incremental enhancements such as improving the search algorithm, Klee-UC, cross-checking, and applying Klee to new types of software such as Linux device drivers. Although these projects could certainly make for publishable papers and maybe even a dissertation, Dawson wasn't nearly as hungry to publish as our newly-arrived competitors were.
Since Klee (and a few related projects from 2005 to 2008) created a new subfield, dozens of assistant professors and young research scientists quickly jumped on the bandwagon and ferociously cranked out paper after paper describing incremental improvements to try to win tenure or job promotions. It was like an academic gold rush, prompted by the insights of Cristi, Dawson, and a few other early pioneers. Since Dawson had tenure and was already famous for creating Klee and other notable projects, he was above the fray and didn't have a desire to publish for the sake of padding his resume.
In effect, Ph.D. students working with those young researchers were more easily able to publish and graduate, while Dawson's students had a much harder time by comparison. In the three years since I quit Klee, dozens of research groups around the world have published hundreds of papers based on Klee-like ideas. Amazingly, fifteen of those papers described enhancements to Klee itself, which our lab released as open-source software to encourage further research. In the meantime, five of Dawson's Ph.D. students have seriously attempted to work on Klee; so far, only one has published a single paper (on Klee-UC).
The sad irony here is that since Dawson's direct competition was now serving as conference program committee members and paper reviewers, it was much harder to get his papers published despite the fact that he co-founded this subfield in the first place. Because Dawson had not been actively publishing in recent years, he no longer knew all of the rhetorical tricks, newfangled buzzwords, and marketing-related contortions required to satisfy reviewers and get his papers accepted into top-tier conferences. Also, the more furiously his competitors published, the more strict the reviewers became about demanding for him and his students to justify the originality of their ideas in relation to the piles of related work, and the more frustrating the paper rejections became. After all, without Dawson's groundbreaking insights from the past decade, these picky reviewers would not even be working in this subfield, much less criticizing and rejecting his papers!
I calculated that the only advantage of staying with Klee was that Dawson deeply loved the project. Even if I couldn't get any papers published, he could maybe appeal to let me “pity graduate” with zero publications. But given my painful past with Klee, I couldn't stomach the possibility of grinding on it for an unknown number of additional years just for the hope of a pathetic “pity graduation.”
By now, I had finished three years of my Ph.D. still without any idea of how I was going to eventually put together a dissertation. I had no concrete plan looking forward, but I knew that I wanted to get away from Klee once and for all.
Copyright © 2012 Philip Guo