The Ph.D. Grind
Immediately after my third year of Ph.D., I spent the summer of 2009 in Seattle, Washington as an intern at the headquarters of Microsoft Research. It was one of the most fun and productive summers of my life: My internship project led to the publication of three top-tier conference papers and, more importantly, helped establish the motivation for my dissertation work.
At present, Microsoft Research is the premier corporate institution for producing top-notch academic research. Research labs in most companies are usually focused on R&D efforts to directly improve their own future products. However, the primary mission of Microsoft Research (abbreviated “MSR”) is to perform fundamental science and engineering research with the intent of publishing top-tier academic papers in computer science and a few related fields.
The best way to think of MSR is as a giant research university without any students. The full-time researchers are like professors, except that they can focus nearly all of their time on research since they don't have teaching or advising duties. But perhaps their favorite job benefit is that they don't need to apply for grant funding, which is a tedious recurring activity that saps professors' time. Since Microsoft is an immensely profitable company, it allocates hundreds of millions of dollars each year to funding academic (paper-producing) research. Microsoft is betting that some of the intellectual property created by its researchers might inspire future products, and it also wants the best minds in computer science on staff for consultation. That's why the company gives its researchers access to all of the resources required to do their best work.
Getting a full-time researcher position at MSR is as difficult as getting a job as a professor at a prestigious university. Although MSR researchers don't technically have tenure, job security is fairly good, especially if they continually publish. Since lots of computer science research is labor-intensive, researchers often hire Ph.D. students as summer interns to help implement their ideas. It's a great deal for both parties: Researchers get students to assist with manual labor, and students get the chance to publish top-tier papers with famous researchers outside of their universities and possibly get letters of recommendation for future jobs. In the past decade, a significant fraction of the papers at top-tier computer science conferences were written by MSR researchers and their interns.
When I arrived at the MSR headquarters in the beginning of the summer, the campus was abuzz with the energy of hundreds of Ph.D. students meeting their managers and preparing to get to work. Since we were there for only three months, our managers planned well-defined projects that would likely result in a paper submission. Most of us were able to submit at least one paper from our summer work, and a fraction of those papers ended up getting published. Of course, research is inherently risky, so some interns were assigned projects that never panned out into publications. Nonetheless, almost everyone had a wonderful time—we were paid over four times our usual grad school stipends, treated to fun Microsoft-sponsored social outings, and attended lots of stimulating talks by top-notch researchers.
Perhaps the longest-lasting impact of an MSR internship is the friendships we all made. During that summer, I had the privilege of getting to know some of the brightest and most inspiring young computer science researchers of my generation. For instance, one of my three officemates was about to start her Ph.D. at MIT, and she had already published more top-tier papers from her undergraduate research than most Ph.D. students could ever hope to publish. Another officemate was a UC Berkeley Ph.D. student who spent his nights and weekends working on a separate research project with collaborators across the country in addition to doing his internship project during workdays. These peers will likely grow into award-winning professors, research leaders, and high-tech entrepreneurs, so I am humbled to have been in their presence for a summer.
The story of how I arrived at MSR that summer illustrates the importance of combining concrete achievements with professional connections. Many Ph.D. students get internships (and later full-time jobs) through some sort of connection, and I was no exception.
I first applied to be an intern at MSR during my second year while I was working with Scott and Joel on their HCI programming lab study project. I applied through regular channels by submitting my resume online, and my application was quickly rejected in favor of those students with more publications and usually some inside connections.
One year later, during my third year, an MSR researcher saw that my work with Scott and Joel had been published in an HCI conference, so he emailed me to ask whether I was interested in doing an internship with him on a loosely related project. He sought me out in particular because my first undergraduate research supervisor at MIT had introduced us to one another several years earlier, so he had some recollection of who I was.
I was honored by his offer but told him that I was no longer working on HCI research; by then, I had already gone back to bug-finding work with Klee. However, I expressed a strong interest in working on empirical software measurement research at MSR, since I had spent my second year doing that sort of work with Dawson. He immediately forwarded my resume to his colleague Tom, who was a rising star in the empirical software measurement subfield. After introducing myself via email, I sent Tom the short paper that I coauthored with Dawson from our Linux bug report measurement work. Tom liked my paper, so he decided to hire me as his summer intern. I had read several of Tom's research papers during my second year, so I was very excited about the possibility of working with him.
If I had blindly submitted my resume online like hundreds of other applicants, I would have probably not been able to attract Tom's attention. Most of my fellow interns also got their jobs through connections, although usually their advisors made a direct recommendation to a relevant MSR colleague. Interestingly, it wasn't Dawson, but rather one of my undergraduate research supervisors (from a project I did over six years earlier) who provided the much-needed connection for me. This same supervisor would later provide a crucial introduction that led to my first full-time job after graduation. From this experience, I learned about the importance of being endorsed by an influential person; simply doing good work isn't enough to get noticed in a hyper-competitive field.
Tom defined the high-level scope of my internship project and set a realistic yet ambitious goal of submitting a paper to a top-tier conference at the end of the summer. My project was to quantify people-related factors that affect whether software bug reports are successfully fixed, reassigned to others, or reopened after supposedly being fixed. To obtain these insights, I wrote computer programs to analyze software bug databases and employee personnel data sets within Microsoft. I was well-prepared to do this sort of data mining and analysis work, since I had spent most of my second year doing similar analyses with Dawson on Linux bug report and revision control history data.
Tom would drop by my office at 5pm each afternoon before he left work to check up on my progress. Although daily check-ups could potentially be stressful, I actually found them immensely helpful since Tom wasn't intimidating or judgmental at all. Getting immediate daily feedback made it easy for me to stay focused and motivated. The combination of a well-defined, short-term goal and continual helpful feedback made my internship workdays much more productive than those during my previous three years of grad school. The best part was that I worked only during normal office hours (9am to 6pm). There was no possible way to take my work home with me since the data was available only within Microsoft, so I just chilled every evening and had fun without worrying about whether I ought to be working more; back at school, I constantly worried about whether I was working enough since I could potentially be working during all waking moments.
Since Tom had published and reviewed dozens of empirical software measurement papers, he was definitely an “insider” who knew what sorts of results and write-ups were well-liked by reviewers in that subfield. When it came time to submit our paper at the end of the summer, Tom was able to deftly frame our contributions in the context of related work, argue for why our results were novel and significant, and get our paper as polished as possible. Three months later, I was delighted to learn that our paper on studying causes of bug fixes was accepted at a top-tier conference where only 14 percent of all papers submitted that year were accepted.
But Tom wasn't done yet! Since he was a newly-hired researcher at MSR, he was eager to establish his reputation by publishing more follow-up papers. Over the next few years, we used the results from my summer 2009 internship to write two additional top-tier conference papers, one about bug report reassignments and another about bug report reopenings (which won a Best Paper Award).
My success in doing empirical software measurement research at MSR with Tom (resulting in three top-tier papers) was a satisfying redemption from the failures that I had experienced when working in this same subfield throughout my second year (resulting in two rejections followed by a shorter-length, second-tier paper). Since I had not grown much smarter between those two contrasting experiences, I give most of the credit for my internship project's success to two sources: Microsoft and Tom.
First, as an intern at MSR, I had access to a rich array of internal data sets about Microsoft's software bugs and personnel files. There was no way that I could have gotten access to those confidential data sets as an outsider. The richness of the Microsoft data sets enables MSR researchers such as Tom to more easily obtain groundbreaking publishable results than their competitors who don't have access to such data. In contrast, when I was working with Dawson, the Linux data sets I obtained were much smaller and of lower quality, since open-source software projects usually don't maintain records as meticulously as one of the world's largest software companies.
Second, Tom deserves lots of credit: Since he was a veteran insider in the empirical software measurement subfield, he knew how to advise me as a technical mentor and also how to craft the nuances of our paper submissions to maximize their chances of acceptance. In contrast, Dawson was an outsider who merely had a passing interest in these topics, so he had neither the motivation nor the abilities to advise projects in this subfield (even though he was world-famous in another subfield—software bug-finding).
During my second year, I lamented about how hard it was for Dawson and me to publish our work, since we had to compete with hordes of professionals who specialized in empirical software measurement. Now, it felt amazing to finally experience what it was like to be on the winning team working alongside one of those professionals.
Even though I had a wonderful summer “intermission” from my Ph.D. program, I still didn't have any plans for my dissertation project when I returned to Stanford in the fall. All I knew was that I didn't want to keep working on Klee, but I had no idea what I could do that was both personally motivating and, more importantly, publishable.
I contemplated trying to extend my current internship work into my dissertation. However, I ultimately concluded that it would be too hard to publish more papers once I returned to Stanford and no longer had access to Microsoft's internal data sets. In an ideal world, I would have been able to do all of my dissertation work within MSR. This option didn't seem feasible, though, since I didn't know any students who had previously done so.
As a last-ditch effort, I contacted my former internship manager at Google to ask whether I could become an intern again and access Google's internal software bug data sets to do empirical software measurement research. He seemed receptive to my idea, but I didn't follow-up with the proposal since it seemed unlikely to pan out: He wasn't an academic researcher himself, and I didn't know anybody else at Google who would be willing to support such a special arrangement. Thus, I decided not to pursue empirical software measurement for my dissertation, so the three papers that I eventually published from my MSR internship didn't help me graduate. However, this experience was still useful for improving my research and technical writing skills.
Desperate to generate another plausible dissertation idea, I spent my nights and weekends throughout the summer reading research papers and brainstorming at coffee shops. At one point, I even thought about creating a dissertation project based on Klee-like ideas but without using the Klee tool itself. This scheme would allow me to free myself from Klee while still capturing some of Dawson's interest. Unfortunately, I wasn't able to generate any substantive ideas along those lines that hadn't already been published.
And then, on July 24, 2009—halfway through my internship—inspiration suddenly struck. In the midst of writing computer programs in my MSR office to process and analyze data, I came up with the initial spark of an idea that would eventually turn into the first project of my dissertation. I frantically scribbled down pages upon pages of notes and then called my friend Robert to make sure my thoughts weren't totally ludicrous. At the time, I had no sense of whether this idea was going to be taken seriously by the wider academic community, but at least I now had a clear direction to pursue when I returned to Stanford to begin my fourth year.
Copyright © 2012 Philip Guo