Thursday Extra 2/8/18: Incorporating Data Science into Introductory CS Course

Thursday, February 8, 2018
4:15 p.m. in Science 3821
Refreshments at 4:00 p.m. in the Computer Science Commons (Science 3817)

A Functional Approach to Data Science in CS1, presented by Professor Samuel A. Rebelsky, discusses the new "data science" version of CSC 151 he has been doing with Titus Klinge and Sarah Dahlby Albright.

As part of the development of a new interdisciplinary initiative in data science that draws from statistics, mathematics, computer science, and the social sciences, we have developed a new introductory CS course that emphasizes data science and that we refer to as DataCSCi. Unlike other introductory data science courses, such as Berkeley's Data 8, our course retains the broad array of concepts necessary not only to introduce programming principles related to data science, but also to prepare students for the second course in our standard introductory computer science sequence. In particular, the course includes coverage of recursion (numeric and structural), unit testing, linked data structures, and other concepts we rely upon in subsequent courses in computer science.

At the same time, we introduce students to a wide variety of techniques and approaches that support them in their subsequent work in data science, including techniques for wrangling, cleaning, and visualizing data. We achieve this combination of breadth and depth through two core approaches: We focus on a spiral "use then implement" approach and we focus on a functional model of programming using Scheme/Racket. While Python and R are the most commonly used languages for data science, we find that Scheme works particularly well to introduce students to concepts both complex, like map-reduce, and simple, like list filtering.