The volume of data collected today is growing astronomically as our computers, phones, and smart devices track our every move, purchase, and desire. Yet, the number of people who can sift through this data to find useful information remains a meager percentage of the workforce.
The need for more data analysts and data scientists is simply outpacing the supply. So colleges and universities need to help close the widening knowledge gap. This is the crux of an upcoming article in the IEEE Computer Journal (Institute of Electrical and Electronics Engineers ) written by University of Miami Computer Science Professors Yelena Yesha, and Mitsunori Ogihara and graduate student Jerry Bonnell (pictured together below).
It was also the impetus for the Master of Science in Data Science program, now offered through the Graduate School. And it is why Ogihara and Bonnell designed and piloted an undergraduate class this fall called “Data Science for the World,” for students interested in the field. It will also be offered next fall, they said. “Many disciplines today including science, medicine, social sciences, and even the humanities use data for discoveries or the exploration of ideas,” Ogihara said. “So a student of any reasonable undergraduate program today should have some exposure to data science.”
“We want all the students at the University to have more of a data science education and to be more data-aware”
Nick Tsinoremas, Founding Director of the Institute for Data Science and Computing (IDSC), and the University’s Vice Provost for Research Computing and Data agreed. “We want all the students at the University to have more of a data science education and to be more data-aware because this is our future,” he said. “To make decisions in general today, one needs to be data-aware, so this course is part of our effort as a University to expose our undergraduates to data science.”
It comes at a time when many colleges and universities around the country are trying to educate students in the language of data. However, unlike other Data Science 101 classes, Ogihara and Bonnell tailored theirs so that students with little to no knowledge of statistics or computer programming could still benefit from it.
“We tried to make it accessible, so we don’t assume that students have a background in math, programming, or statistics,” said Ogihara. The two even wrote an online textbook for the course, which opens with a list of real-world examples of data science in practice: like the fact that monitoring patient data can help doctors more accurately diagnose diseases, and tracking social media posts can help data scientists explain a shift in public opinion.
The online resource is now being edited for publication, and is unique because the textbook uses the favored programming language of many statisticians, R. Ogihara and Bonnell chose to use R because it is attuned to statistical analysis and, by incorporating an increasingly popular collection of tools in R called tidyverse, students can easily learn how to process, wrangle, transform, and model data on their own too.
“From the beginning to the end of the course, students were touching real data with their assignments,” Bonnell said. “So they could always see the big picture and knew they were doing something important.” For example, the class’ 20 students looked into the 2018 accusation that the New England Patriots deflated footballs during the game because it was easier for quarterback Tom Brady to catch and throw them in the cold. They tested whether the average ball pressure drop was due to randomness, and concluded it was plausible that the pressure drops observed were due to a reason other than chance.
That was one of freshman Eddie Hanlon’s favorite assignments. But he also enjoyed testing whether murder rates are affected by a state’s policy on the death penalty, another assignment they did in the class. “We were able to conclude that when the death penalty is enforced, murder rates go down,” said Hanlon, a finance major with a minor in computer science. Hanlon said he has always been interested in numbers, but the course helped him learn some computer programming that can further his analysis. It also taught him some new statistical strategies. “I had never done any programming, and had zero experience with the software R,” he said. “But by the end of the semester, I felt pretty proficient in R.”
“Data science is extremely applicable in so many different fields. I definitely see it as a career possibility”
He was so empowered by the course that Hanlon spent part of his winter break learning another programming language, called Python, which is also used widely by data scientists. “I wasn’t initially interested in data science—I just didn’t know enough about the field—but I am now. Data science is extremely applicable in so many different fields. I definitely see it as a career possibility,” he said.
Senior math major Caroline Hall took the course because she wanted to learn how to use R better for future job opportunities. She had already learned the programming language Java, but the data science class helped Hall feel so comfortable with R, it allowed her to learn two other tools since then—SQL and Tableau, which help transform and visualize datasets. “I feel confident now in being able to transform datasets, which means to organize the data and extract the most useful information from it,” said Hall, who also has minors in computer science and psychology. It also piqued her interest in a career in data science. “I want to start off as a data analyst, but I know we work with data scientists, so I may want to transition into that,” she said.