Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

How did I find my second batch of students for my lab?

Two years ago I wrote How did I find my first batch of students for my lab? to recap how I started my lab as a brand-new assistant professor at the University of Rochester. This past year felt like deja vu as I had to start all over again at UC San Diego and rebuild my lab (almost) from scratch. Here's what I did this time around.

From TOS to TNG

I like to think of my first batch of students as the Star Trek: TOS (The Original Series) crew and this second batch as Star Trek: TNG (The Next Generation). Both are awesome ensemble casts!

Xiong was my only student from Rochester to continue working with me in this new TNG era. (So maybe he should be O'Brien or Worf if I had made a TNG -> DS9 analogy, but nevermind I'm digressing.) He remains a Ph.D. student at Rochester with me as his advisor; I still have an unpaid research assistant professor position at Rochester and grants there to fund him.

I spent my first year at UCSD (2016–2017) ramping up my new lab as quickly as possible and starting to submit papers. This “quick-start” strategy ended up working really well since we got four papers published with this new crew just in this first year. (Aside from the inherent value of publications, these early successes got my new batch of students pumped and energized, which I feel is critical for building up morale and momentum in the coming years.) Here are our first four papers:

DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. Xiong Zhang and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
DS.js transforms any existing webpage into a live programming environment for data science.
Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an example-centric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user's code and data relate.
@inproceedings{ZhangUIST2017,
 author = {Zhang, Xiong and Guo, Philip J.},
 title = {{DS.js}: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}
Omnicode: A Novice-Oriented Live Programming Environment with Always-On Run-Time Value Visualizations. Hyeonsu Kang and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
Omnicode constantly visualizes the full history of all numeric values in your code as you're coding.
Visualizations of run-time program state help novices form proper mental models and debug their code. We push this technique to the extreme by posing the following question: What if a live programming environment for an imperative language always displays the entire history of all run-time values for all program variables all the time? To explore this question, we built a prototype live IDE called Omnicode ("Omniscient Code") that continually runs the user's Python code and uses a scatterplot matrix to visualize the entire history of all of its numerical values, along with meaningful numbers derived from other data types. To filter the visualizations and hone in on specific points of interest, the user can brush and link over the scatterplots or select portions of code. They can also zoom in to view detailed stack and heap visualizations at each execution step. An exploratory study on 10 novice programmers discovered that they found Omnicode to be useful for debugging, forming mental models, explaining their code to others, and discovering moments of serendipity that would not have been likely within an ordinary IDE.
@inproceedings{KangUIST2017,
 author = {Kang, Hyeonsu and Guo, Philip J.},
 title = {Omnicode: A Novice-Oriented Live Programming Environment with Always-On Run-Time Value Visualizations},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}
Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing. Alok Mysore and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
Torta lets you make programming/sysadmin tutorials by simply demonstrating command-line/GUI actions.
Tutorials are vital for helping people perform complex software-based tasks in domains such as programming, data science, system administration, and computational research. However, it is tedious to create detailed step-by-step tutorials for tasks that span multiple interrelated GUI and command-line applications. To address this challenge, we created Torta, an end-to-end system that automatically generates step-by-step GUI and command-line app tutorials by demonstration, provides an editor to trim, organize, and add validation criteria to these tutorials, and provides a web-based viewer that can validate step-level progress and automatically run certain steps. The core technical insight that underpins Torta is that combining operating-system-wide activity tracing and screencast recording makes it easier to generate mixed-media (text+video) tutorials that span multiple GUI and command-line apps. An exploratory study on 10 computer science teaching assistants (TAs) found that they all preferred the experience and results of using Torta to record programming and sysadmin tutorials relevant to classes they teach rather than manually writing tutorials. A follow-up study on 6 students found that they all preferred following the Torta tutorials created by those TAs over the manually-written versions.
@inproceedings{MysoreUIST2017,
 author = {Mysore, Alok and Guo, Philip J.},
 title = {Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}
HappyFace: Identifying and Predicting Frustrating Obstacles for Learning Programming at Scale. Ian Drosos, Philip J. Guo, Chris Parnin. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2017.
HappyFace uses a five-level pain scale to identify causes of frustration when learning programming.
Unnecessary obstacles limit learning in cognitively-complex domains such as computer programming. With a lack of appropriate feedback mechanisms, novice programmers can experience frustration and disengage from the learning experience. In large-scale educational settings, the struggles of learners are often invisible to the learning infrastructure and learners have limited ability to seek help. In this paper, we perform a large-scale collection of code snippets from an online learn-to-code platform, Python Tutor, and collect a frustration rating through a light-weight learner feedback mechanism. We then devise a technique that can automatically identify sources of frustration based on participants labeling their frustration levels. We found 3 factors that best predicted novice programmers' frustration state: syntax errors, using niche language features, and understanding code with high complexity. Additionally, we found evidence that we could predict sources of frustration. Based on these results, we believe an embedded feedback mechanism can lead to future intervention systems.
@inproceedings{DrososVLHCC2017,
  author={Drosos, Ian and Guo, Philip J. and Parnin, Chris},
  title={{HappyFace}: Identifying and Predicting Frustrating Obstacles for Learning Programming at Scale},
  booktitle = {Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)},
  series = {VL/HCC '17},
  year={2017},
  month={Oct}
}

High-level strategy

This time around, I was much more picky about selecting students, since I had already worked with almost two dozen by this point, so I now have a far better sense of what traits I value. In contrast, back at Rochester I was purposely much more open and broad with student recruiting since I was new to it and wanted to collect as much useful experiential data as possible. I understood full well that not everyone in my first batch would succeed in research, but I wasn't yet experienced enough to know exactly who would and wouldn't do well. Thus, I tried working with many kinds of students. Here at UCSD, though, I focused my limited time and energy on working only with students who have a strong chance of success in publishing their research (even though I don't realistically expect a perfect 100% hit rate).

That said, I knew I still couldn't rely solely on the regular annual Ph.D. admissions cycle to get new students for my lab, because they wouldn't start here until fall 2017, which would be the beginning of my second year at UCSD. Plus, there's no guarantee that students whom we admit would actually accept their offers to come here. Plus, even if they did come, it would take them another year or so to ramp up and start getting productive.

Thus, just like in Rochester, I tried hard to look for great students who were already at UCSD so that they could get started right away during my first year here. Why such a strong sense of urgency? Because although this was only Year One at my new job, it was Year Three of me being an assistant professor, and I want to ideally come up for tenure a few years from now, so I need to keep up my level of research productivity.

The TNG crew

Here's how I found my new group of students over this past year.

Three of my new students were TAs during my first quarter teaching at UCSD in fall 2016, where I taught the large 300-student introductory HCI course. I had 6 grad-student TAs, and out of those 3 ended up being good fits for my lab (the other 3 were already in other labs). Working with these students as my TAs helped me get to know their personality, technical skills, and work ethic. This gave me a better sense of whether they might work well with me in a research setting. Here are my first three UCSD TAs-turned-lab-students:

  • Alok was a brand-new first-year masters student in Computer Science & Engineering, and he was recommended to me by Rajan Vaish because he had worked remotely from India with Rajan and Michael Bernstein on their crowd research project. Alok first-authored a UIST 2015 poster on his part of the project.
  • Kandarp was a second-year masters student in Computer Science & Engineering. He really wanted to stay at UCSD as a Ph.D. student, so we got started on a project right away in order to strengthen his Ph.D. application. We ended up admitting him, and he will continue as a Ph.D. student advised by me.
  • Hyeonsu was a second-year masters student in Computer Science & Engineering. He came to a research overview talk I gave at the beginning of the fall quarter and asked some really keen and insightful questions, so I talked to him more about working on research. Even though he recently graduated and we got to work together for less than a year, we managed to get the Omnicode paper published!

In addition, one of my other new students who was already at UCSD was recommended to me by a colleague:

  • Charles was a senior undergrad in Computer Science & Engineering and is now continuing on as a masters student advised by me. He was recommended to me by Beth Simon since he had worked with her in the prior summer on analyzing MOOC data from UCSD's data science courses, which is definitely a relevant skill set for my lab.

Finally, we admitted two new Ph.D. students in the cognitive science department who will start off with me as their advisor. They will arrive in this upcoming fall 2017 quarter:

  • Sean was on my radar because one of his advisors at Johns Hopkins, Jeff Leek, emailed me earlier about some Python Tutor stuff. Sean currently works at Johns Hopkins managing their data science MOOCs, which is again a super-relevant skill set for my lab! I actually didn't realize Sean was applying to Ph.D. here since he didn't cold-email me, so it was good that I saw his application in the huge pile when looking over admissions files this past year.
  • Ian was a masters student at North Carolina State University advised by my collaborator Chris Parnin. The three of us have been working together over the past year on a project that recent got published as the HappyFace paper. I already got to know Ian a bit through that experience. I encouraged him to apply to our Ph.D. program, and he was admitted, so he will also be starting here in fall 2017.

In sum, just like the first time, I relied on TA experiences, personal referrals, and prior collaborations to do student recruiting for my new lab.

I'm glad that I was able to ramp up very quickly with staffing a full lab of six graduate students in my first year here. Now I don't have to worry about recruiting (which can be time-consuming!) and can focus more of my energy on doing research with this crew.

That's all for now, folks!

Created: 2017-07-23
Last modified: 2017-07-23
Related pages tagged as assistant professor life: