Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science

research paper summary
DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. Xiong Zhang and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
(Honorable Mention Paper Award)
Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an example-centric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user's code and data relate.
@inproceedings{ZhangUIST2017,
 author = {Zhang, Xiong and Guo, Philip J.},
 title = {{DS.js}: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}

Data science is often taught using production-grade IDEs such as RStudio for R, MATLAB, and Jupyter notebooks for Python and other languages. These IDEs are often situated within Unix-like command-line environments to handle data file management, script execution, and version control. We talked to a few prominent data science instructors and found that the status quo has several major drawbacks:

  • Novices must deal with the complexities of installing and configuring complex IDEs meant for professionals. On top of that, they must also grapple with arcane Unix-like command-line concepts to manage data files and scripts locally on their machines (e.g., what are absolute vs. relative paths? where did my files go? why can't I run this script from that directory?!?).
  • Since these IDEs are designed for professionals, they don't provide any instructional scaffolding to help novices build mental models of how data science APIs operate.
  • From the instructor's perspective, code, data, and exposition are stored in separate places, which makes it harder to produce and distribute self-contained instructional materials.

These limitations all stem from the fact that people currently need to bring their data into monolithic data science environments, but what if instead they could bring a lightweight data science environment directly to their data? To explore this idea, we built a prototype browser bookmarklet called DS.js, which (as this paper's title implies) turns any webpage into a live programming environment for learning data science. The webpage contains all of the required data, and DS.js brings the user's code directly to it.

The beauty of a bookmarklet is that it works in any modern web browser and doesn't require users to install or configure anything. (To “install” DS.js, simply drag its bookmarklet into your bookmarks bar like a regular webpage bookmark.)

Here's how DS.js works (click image to enlarge):

  1. Visit any webpage containing data, either inline as HTML elements (e.g., tables, lists, divs) or linked as external data files (e.g., CSV or TSV file links). Examples of data-rich pages include Wikipedia, government data portals, and sports statistics sites. Click the DS.js button in your bookmarks bar to inject a programming environment onto the current webpage.
  2. DS.js automatically detects structured data sources on the webpage (such as the HTML population table in this Wikipedia example) and parses them into special JavaScript data structures. It also automatically parses, say, CSV files linked from the current page. More advanced users can use a GUI-based selector (powered by SelectorGadget) to visually choose groups of webpage elements to parse or manually write jQuery selectors to parse any other data on the page.
  3. An “Append DS.js editor” button appears next to each parsed data source. Click that button to embed a JavaScript code editor into the webpage right underneath that data source. Within that editor, you can write arbitrary JavaScript code to transform, analyze, and visualize that data. The outputs of your analyses (such as statistics, derived tables, and graphs) get updated live in a pane to the right of your code. DS.js comes with a JavaScript library for introductory data science, modeled after datascience.py; think of it as a super-simplified form of Pandas for Python or the tidyverse for R.
  4. To help build proper mental models, DS.js includes instructional scaffolding in the form of bidirectional previews of code and data. You can click on any code expression to visually preview its effects on the corresponding data tables. You can also click on parts of data tables to preview suggestions for what code to write to transform those parts.
  5. All of your code can be encapsulated in a single URL. This lets you easily share your data science explorations or questions with others, and everyone can safely modify their own copies, again without installing or configuring any software.

In sum, we're really excited about using the web as a substrate for learning data science because it already contains enormous amounts of data in all sorts of domains that could engage students. Educational materials made using DS.js benefit from the authenticity of being situated directly within real-world webpages so that students can see the original context behind their data while writing analysis code. Finally, in addition to being used by students and instructors in educational settings, another potential user audience for DS.js is citizen data scientists who want to play around with analyzing and visualizing data that they find on the web and easily share their findings for others to remix and build upon.


Read the full paper for details:

DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. Xiong Zhang and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
(Honorable Mention Paper Award)
Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an example-centric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user's code and data relate.
@inproceedings{ZhangUIST2017,
 author = {Zhang, Xiong and Guo, Philip J.},
 title = {{DS.js}: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}
Created: 2017-10-02
Last modified: 2017-10-02
Related pages tagged as human-computer interaction:
Related pages tagged as programming: