Year Six: Endgame
The Ph.D. Grind
At the end of my fifth year—right before I went to Google for the summer—I met with Dawson and asked him what it would take for me to graduate within the next year. At the time, IncPy and CDE were published as second-tier conference papers, SlopPy was a workshop paper, and the ProWrangler paper submission was under review. Dawson expressed concerns that my publication record still wasn't sufficient to graduate and that I needed one more substantive contribution to round out my dissertation. His expectations seemed reasonable, so my plan was to return to Stanford in the fall and spend a few months working on new research that could complete my dissertation. My fear, though, was that I was already exhausted from my past year of super-grinding and had no new project ideas brewing. So I went into scheming mode once again, thinking of ways to get that final as-yet-unknown piece of work that would enable me to graduate.
As part of my strategy, I also wanted to find a third (and final) thesis committee member who could strongly vouch for my graduation case. Most Ph.D. students in my department don't need to do so much planning because they work on advisor-sanctioned projects. They don't stress about who their other two thesis committee members are since their advisor vouches for them and the other members usually agree. However, my situation was unique since I hadn't been working on projects that Dawson was passionate about, so I couldn't count on him to wholeheartedly endorse my work. Having Jeff on my committee helped since he was personally invested in our ProWrangler project and could vouch for its legitimacy. But I still needed one more committee member to support my graduation case.
I first emailed Tom, my former MSR manager, and pitched him on the idea of me spending a few months in the fall of 2011 interning at MSR and doing a new project that could contribute towards my dissertation. I wanted him to be on my thesis committee so that I could also include the three papers I published with him from my summer 2009 internship work in my dissertation. Unfortunately, he didn't seem enthusiastic about the idea, so I didn't push further.
I then raised the possibility of extending SlopPy from a workshop paper into a full-fledged conference paper so that it could “count” as a more substantive dissertation contribution. Back in my fifth year, I talked with Martin, an MIT professor whose influential paper directly inspired SlopPy, about working together to extend SlopPy into a conference paper. He was interested in collaborating, but the timing didn't work out since I was busy with CDE and ProWrangler during the latter part of that year. But now, I figured that if I could spend a few months in the beginning of my sixth year working with Martin and have him serve on my thesis committee, then that could be my ticket to graduation. Dawson liked this plan, since (unsurprisingly) he had thoughts about how to combine Klee-like ideas with SlopPy. I planned to email Martin in midsummer to propose this collaboration, but by then, another better opportunity had come along so I no longer pursued this one.
As I began my final summer internship at Google, I looked forward to spending three carefree months polishing up CDE, but I felt a bit anxious because graduation wasn't yet guaranteed upon my return to campus. I needed one more burst of inspiration, and it ended up coming from an unexpected source.
In summer 2011, I finally made up my mind to “retire” from academia after graduation: I didn't know what I was going to do for a career, but I wasn't planning to apply for tenure-track university faculty jobs in the upcoming year.
I made this decision for two main reasons: First, I sensed that my current publication record wasn't impressive enough to earn me a respectable tenure-track faculty job. My hunch was confirmed months later when I saw that, sadly, fellow students with slightly superior publication records still didn't get any job offers. Of course, I could always try to work as a postdoc (temporary postdoctoral researcher) for a few years, publish more papers, and then reapply to faculty jobs.
But the second and more important reason makes doing a postdoc meaningless for me: The kinds of research topics I'm deeply passionate about aren't very amenable to winning grant funding, because they aren't well-accepted by today's academic establishment. Without grants, it's impossible to pay for students. And without motivated students to grind through the tough manual labor, it's impossible to get respectable publications. And without a significant number of publications each year, it's impossible to get tenure. Even if I do earn tenure, I would still need new grants to pay for new students to help me implement my ideas; the funding cycle never ends. Given my research interests, I wasn't emotionally prepared to fight the uphill battles required to get my proposals taken seriously by grant funding agencies. I had a hard enough time convincing peer-reviewers to accept my papers; grant reviewers will likely be even less sympathetic since they are the gatekeepers to millions of dollars and would rather hand the money to colleagues who are doing more mainstream types of computer science research.
I had been considering leaving academia for quite a few years, but I now felt that I had justified reasons for doing so: I understood enough about how the “academic game” worked in computer science to know that I didn't want to keep playing it. I summarized my feelings in an email to a friend who had recently started her job as an assistant professor: “I discovered over the past 5 years that I love being a spectator of research, but the burden of being a continual producer of new research is just too great for me.”
Since my mother is a wildly successful professor and my father also deeply respects academia, it was hard for me to tell them my decision. I didn't think they truly understood my rationale; I was afraid that they felt I was giving up and selling myself short when in reality, becoming a professor hadn't been a real goal of mine for years. One of the claimed benefits of academia is the allure of creative freedom, but my decision to leave academia actually freed up my mind to become even more creative in pursuing my true professional passions, both during my final year of grad school and in searching for a new career.
The immediate impact of my decision to quit academia was that I didn't need to worry about “networking” at the three academic conferences I attended that summer, where I gave talks on IncPy, SlopPy, and CDE. Academic conferences are filled with senior Ph.D. students, postdocs, and pre-tenure professors schmoozing like crazy in attempts to impress their senior colleagues. For these junior researchers, professional networking at conferences is a serious full-time job, since their budding careers and academic reputations depend upon excelling at it. But since I was getting out of this academic game, I didn't care at all and enjoyed myself without being nervous or calculating.
During a break between sessions at one of the conferences, I spotted Margo sitting by herself working on her laptop. Recall that I had met Margo during my fourth year at the San Jose workshop where I presented my original IncPy paper. I debated whether to approach her and to reintroduce myself. A part of me was afraid that she wouldn't remember me and also that I wouldn't have anything interesting to say. But since I had no real schmoozing agenda due to my imminent “retirement” from academia, I had nothing to lose if the conversation ended up fizzling. So I just went up and said hello. I reminded her about how we had previously met, and she seemed to remember me. I briefly told her that I was about to give a talk on my new CDE project and then needed to catch a flight back to California. We had a quick five-minute chat about CDE, and then I had to run to give my talk. After returning home that night, I emailed her a quick follow-up message with a link to the CDE project webpage in case her students were interested in playing with it for their research. This was my standard polite message when advertising CDE to professional colleagues, so I didn't really expect her to follow up.
Two weeks later, I received a surprise email from Margo saying that she had been talking about me with her student Elaine. The two of them wanted me to come work with them at Harvard for a few years as a postdoc after completing my Ph.D. The broader research themes surrounding my IncPy and CDE projects resonated with Margo's interests in creating tools to help make computational researchers more productive. I was very flattered by her offer, but the opportunity didn't make sense since I had already decided to retire from academia. It would be useless for me to do a postdoc, since the main purpose of a postdoc is to boost one's resume to improve the chances of getting a university faculty job.
And then inspiration struck again. Since I was in dire need of one more substantive project and thesis committee member before I could graduate, I made the following counterproposal to Margo: Instead of doing a postdoc after my Ph.D., I asked whether I could visit Harvard for four months in the fall of 2011 to work on a project with her. We could submit a paper to a conference in January 2012 and then include that project as the final portion of my dissertation. I also asked whether she could serve as the final member of my thesis committee. Margo liked this idea but didn't have sufficient grant funding for me, since she needed to fund her own students. I talked to Dawson, and he was generously willing to fund me for those months using his grants even though I wasn't working on Klee (my fellowship had already expired). Margo happily agreed to this arrangement, and after my summer internship ended in September 2011, I moved to Boston, Massachusetts to begin my sixth and final year of grad school.
This final grad school adventure would not have been possible without me actively seizing opportunities that I was fortunate enough to have been given. If Robert hadn't told me about the San Jose workshop two years ago, if I hadn't submitted and presented my IncPy paper there, if Margo hadn't liked my paper and introduced me to Elaine, if I hadn't kept in touch with Elaine, if I hadn't spontaneously said hello to Margo again at last summer's conference where I presented CDE, if she didn't send me a gracious follow-up email, and if I didn't take a risk with my unusual counterproposal to her, then I would have still been back at Stanford struggling to find one last project and thesis committee member.
I had an amazingly fun and productive four months in Boston as a visiting researcher at Harvard. The change of scenery was refreshing: I could focus intensely on research without the usual errands of life back home. Elaine helped me find a wonderful studio apartment within a five-minute walk to my office, and I could easily buy food both on campus and in nearby Harvard Square. This ideal living arrangement enabled me to concentrate on my work without any distractions.
I spent my first month mostly socializing with old college friends since my alma mater, MIT, was located right near Harvard. I also met with Margo a few times to discuss potential research ideas. She was open to me working on my own project under her loose supervision, so I had nearly full intellectual freedom. However, I took a pragmatic approach to my brainstorming since I wanted her to be excited about my project and to strongly support its inclusion in my dissertation. Thus, I read some of her recent papers and grant applications to get a sense of her research philosophy so that I could cater my ideas towards her tastes. By now, I understood the importance of aligning with the subjective preferences of senior collaborators (and paper reviewers), even when doing research in supposedly objective technical fields.
After batting around a few ideas, I came up with something that Margo loved: a tool that monitors researchers' computer-based activities and helps them organize and take notes on their experiments. It was an innovative twist on the traditional electronic lab notebook. Margo jokingly suggested the temporary codename “BurritoBook” to describe our proposed tool, since it seamlessly wraps many layers of activity monitoring around the user's normal workflow. Elaine later shortened the name to “Burrito,” which grew on me and eventually became the official project name.
At the time, I thought that my Burrito idea arose spontaneously from combining my hunches with Margo's preferences, but after looking back at old notes, I realized that similar ideas had been brewing in my head for several years. I started thinking about Burrito-like ideas as early as my second year of grad school when I wanted to monitor how people performed programming, and more concretely at the beginning of my fifth year when I wanted to extend IncPy to record Python-based experiment histories. Throughout grad school, I had been keeping a research lab notebook in various ad-hoc formats to document the process of building prototypes and running experiments, so I personally felt the pain of notetaking-related inefficiencies. Finally, although I wasn't a real HCI (Human-Computer Interaction) researcher, my HCI training with Scott and Joel during my second year and with Jeff during my fifth year gave me a keen sensitivity to user needs that greatly influenced the design of Burrito.
I spent a few weeks sketching out high-level plans for Burrito and discussing preliminary design details with Margo. Many refinements to my initial idea came from observing computational researchers at work and interviewing them about the challenges they faced in managing their multitude of experiment notes, code, and data files; most of my observation subjects were Elaine's friends who worked in various MIT and Harvard science labs. I also received useful early-stage feedback from giving a talk on my Burrito proposal at a lab group meeting led by Rob, the MIT professor I met at the beginning of my second year who encouraged me to pursue my HCI interests with Scott and Joel.
And then social time was over; it was time to grind. In early November 2011, I turned into a programming beast for the final time in grad school to transform my Burrito idea into a working prototype. I did 72 consecutive days of programming with only 5 days of breaks spread intermittently throughout the 2.5-month sprint. This period was the longest I had ever sustained an almost-painful level of nonstop intensity thus far. My initial CDE burst during my fifth year was only 21 days of grinding, and this burst was over three times as long. I worked straight through Thanksgiving, Christmas, and New Year's Eve, relentlessly focused on my goal of getting Burrito working well enough to submit a conference paper by the middle of January 2012.
For those few months, I morphed into an antisocial grump who shunned all distraction and became deeply immersed in my craft. All I thought about was computer code; I could barely speak in coherent English sentences except during my weekly progress meetings with Margo. Even though I appeared and acted subhuman (i.e., an unshaven disheveled mess), my emotional state was blissful. I was programming and debugging for over ten hours per day, but my mind was quite relaxed since my technical skills were well-calibrated for the challenges I faced. By now, I had accumulated enough experience in designing, implementing, and “marketing” research prototypes that I was confident in my abilities to make this project work. I received wonderful feedback and support from Margo along the way, so I sensed that she would strongly endorse the inclusion of Burrito in my dissertation. After years of grinding on uncertain and failed projects earlier in grad school, I now felt invigorated working intensely towards a target that I knew I could feasibly achieve.
By mid-January 2012, the Burrito prototype was in fine shape, so we ran an informal evaluation, wrote up a paper, and submitted it to the conference as planned. I took a few days off to return to normal human mode, said goodbye to my Boston friends, and flew back to California for the Ph.D. endgame.
The popular view of how a Ph.D. dissertation arises is that a student comes up with some inspired intellectual idea in a brilliant flash of insight and then spends a few years writing a giant treatise while sipping hundreds of lattes and cappuccinos. In many science and engineering fields, this perception is totally inaccurate: The “writing” is simply combining one's published papers together into a single document and surrounding their contents with introductory and concluding chapters. All of the years of sweaty labor has already been done by the time a student sits down to “write” their dissertation document.
In my department, the most important milestone in a Ph.D. student's career is when their advisor gives the thumbs up to begin the dissertation writing process. This gesture signals that the student has done enough work—usually publishing two to four conference papers on one coherent theme—and deserves to graduate within a few months.
When I returned to Stanford in January 2012, my goal was to secure that vital thumbs up from Dawson as soon as possible. I wrote up a short document presenting evidence for why I felt I had done enough work to graduate. My argument was simple: I created five innovative software tools to improve the workflow of computational research programmers—IncPy, SlopPy, CDE, ProWrangler, and Burrito—and published 1 top-tier conference paper, 3 second-tier conference papers, and 3 workshop papers from my body of work (the Burrito conference submission ended up being rejected, so we resubmitted and published in a workshop). As an added bonus, my two other thesis committee members, Jeff and Margo, could also vouch for my graduation case since I had done successful projects with them (ProWrangler and Burrito, respectively). I emailed the document to Dawson and nervously awaited his response. I thought my case was pretty strong, but I had no idea whether he expected me to do more work before allowing me to graduate. To my great relief, he quickly gave me the thumbs up, and that's when I knew that I was essentially done with grad school.
I spent the next two months combining all of my papers together into a 230-page dissertation document entitled Software Tools to Facilitate Research Programming. Here is the abstract (summary) from the first page of my dissertation:
Research programming is a type of programming activity where the goal is to write computer programs to obtain insights from data. Millions of professionals in fields ranging from science, engineering, business, finance, public policy, and journalism, as well as numerous students and computer hobbyists, all perform research programming on a daily basis. My thesis is that by understanding the unique challenges faced during research programming, it becomes possible to apply techniques from dynamic program analysis, mixed-initiative recommendation systems, and OS-level tracing to make research programmers more productive. This dissertation characterizes the research programming process, describes typical challenges faced by research programmers, and presents five software tools that I have developed to address some key challenges. 1.) ProWrangler is an interactive graphical tool that helps research programmers reformat and clean data prior to analysis. 2.) IncPy is a Python interpreter that speeds up the data analysis scripting cycle and helps programmers manage code and data dependencies. 3.) SlopPy is a Python interpreter that automatically makes existing scripts error-tolerant, thereby also speeding up the data analysis scripting cycle. 4.) Burrito is a Linux-based system that helps programmers organize, annotate, and recall past insights about their experiments. 5.) CDE is a software packaging tool that makes it easy to deploy, archive, and share research code. Taken together, these five tools enable research programmers to iterate and potentially discover insights faster by offloading the burdens of data management and provenance to the computer.
I spent a lot of effort crafting new introductory and concluding chapters to turn my dissertation into more than merely a description of five separate tools that I had built over the past few years. Throughout the writing process, Jeff and Margo gave me great feedback on how to frame my research contributions in a more substantive intellectual light. Even though I know that few people will end up reading my dissertation—the constituent papers are far more accessible—it felt satisfying to collect all of my ideas, insights, tool descriptions, and evaluation results together into one cohesive document.
I scheduled my oral defense for Monday, April 23, 2012. The biggest challenge was finding a two-hour time slot where five busy professors (my three thesis committee members plus two additional oral committee members) were available. Margo was visiting California for a conference during that week, so I planned around her schedule. In my department, the format of the oral defense is that the student gives a one-hour public talk summarizing their dissertation research, and then there is a one-hour private session where the committee asks probing questions. Afterwards, the committee votes to either pass or fail the student. In reality, almost nobody fails their defense unless they act totally moronic: The committee will usually have read through and approved a student's dissertation before they let that student defend, so there should be no surprises.
I didn't have time to present all five projects during my oral defense talk, so I chose to present three projects, one that I did with each member of my thesis committee: IncPy with Dawson, ProWrangler with Jeff, and Burrito with Margo. Most Ph.D. students publish papers with only their advisor, so it was a rare honor to get to talk about research that I did with all three of my committee members. I was also happy that many of my friends and former colleagues—including Scott, Joel, Peter, Robert, Greg, and Fernando—attended my defense.
Even though I had given dozens of academic talks throughout grad school, I was more tense than usual during my defense, perhaps because I knew almost everybody in the audience. Strangely, I feel much more at ease giving talks to rooms filled with strangers rather than familiar faces. The private session wasn't as grueling as I had anticipated, but my committee did raise some questions and suggestions that ended up improving my dissertation.
After I passed, my committee and friends all gave me polite congratulations, which was a nice but expected gesture. The compliment that I will cherish the most came from a senior professor with whom I had only spoken once. I was a bit surprised to see him at my defense since I didn't think he would be interested in the topic. After my defense, he sent me the following email praising my talk: “I just wanted to say that I really enjoyed it, partly because of the creativity of the work, partly because of the well-prepared talk, and partly because I had spent the previous year doing research programming.”
Of the 26 Stanford Computer Science Department Ph.D. graduates in my year, I consider myself fairly mediocre from an academic perspective since most of my papers were second-tier and not well-received by the establishment. My dissertation work awkwardly straddled several computer science subfields—Programming Languages, Human-Computer Interaction, and Operating Systems—so it wasn't taken seriously by the top people in any one particular subfield.
Despite lack of mainstream acceptance, I still thought that my Ph.D. ended successfully because I was able to carry several of my own ideas to fruition and graduate with a dissertation that I was very proud of. I took a highly entrepreneurial approach to my Ph.D.—opportunistically seeking out projects and collaborators, walking a fine line between being unconventional and conforming enough to get my papers published. I feel extremely lucky to have been able to take charge of my Ph.D. career in creative ways; I wouldn't have had nearly as much freedom without the fellowships that funded five out of my six years at Stanford.
In the end, like most Ph.D. dissertations, mine expanded the boundaries of human knowledge by a teeny microscopic amount. The five prototype tools that I built contain some interesting ideas that can be adapted by future researchers. In fact, I will be honored if future researchers cite my papers as examples of shoddy primitive hacks and argue for why their techniques are far superior. That's how research marches forward bit by bit: Each successive generation builds upon the ideas of the previous one.
However, to me, the most significant contribution of my dissertation wasn't those specific prototype tools. Rather, it was that, to the best of my knowledge, I was one of the first computer science Ph.D. students to identify a pervasive problem—the lack of software tools catered to the needs of a large and growing population of computational research programmers—and to offer some early-stage prototype solutions that others can improve upon. I believe that these ideas will become more important in the upcoming decades, but since I'm retiring from academia, I won't be around to directly promote them.
Since my dissertation topic is far from being mainstream, any junior professor or scientist who tries to build their academic career upon its ideas will struggle to gain the respect of grant funding agencies, which are the gatekeepers to launching new projects, and their senior colleagues, who are the gatekeepers to publication and tenure. I will be more than happy to assist anybody who wants to take on this noble fight, but I'm not brave enough to stake my own career on it. Instead, I plan to now pursue a completely different professional passion, which might someday be the subject of a future book :-)
In preparation for writing this memoir, I dug through lots of my old research notes. One day, I found the following snippet about a topic that I was interested in investigating:
Research into software development tools for non-software engineers, but rather for scientists, engineers, and researchers who need to program for their jobs -- they're not gonna care about specs., model checking, etc. -- they just want pragmatic, lightweight, and conceptually-simple tools that they can pick up quickly and use all the time.
The shocking thing about this note is that I wrote it six years ago in the summer of 2006, right before I started the Ph.D. program at Stanford. It's been a long, circuitous, and unpredictable journey, but I'm incredibly grateful that I was able to turn this broad topic—one out of dozens that caught my interest over the years—into my Ph.D. dissertation. This accomplishment wouldn't have been possible without a rare combination of great luck, personal initiative, insightful nudges from generous people, and nearly ten thousand hours of grinding.
Copyright © 2012 Philip Guo