Data. Although it is a short word, often dismissed as numbers or statistics, it follows us everywhere today. From the steps that we take, to our traffic patterns and driving speeds, to identifying our genomic sequence and even our daily purchasing habits. And more each day, governments, schools, and businesses are seeing the benefits of this information.
It helps cruise lines track which passengers are buying more sodas while they are on board, and then an hour later, offer discounts to those who have not ordered drinks. Hospital workers use it to monitor the number of patients they receive weekly, then can correlate it with historical data about when illnesses spike to help decide when more staff are needed.
These are just a couple of examples of the power of data science.
“The question is, can you use the data that exists to make better decisions on day-to-day activities or planning for the future,” said University of Miami Computer Science professor Mitsunori Ogihara (pictured above at the annual Big Data Conference), who helps consult on projects like these as the IDSC Director of Artificial Intelligence and Machine Learning. “In many cases, this data is very large.”
At UM, researchers are well aware of the value that data holds to improve the future and their individual work. Professors in nearly every one of the University’s 11 schools and colleges are taking advantage of the supercomputer available through IDSC to analyze large amounts of data that can bolster their research. And soon, the University will be modernizing their supercomputer to further advance the U’s data science potential.
For example, in the Miller School of Medicine, researchers like professors Vance Lemmon and John Bixby, along with Computer Science professor Zheng Wang, are using massive data sets to create models of the human genome that can shed light on how our genes control nerve regeneration. In the Rosenstiel School for Marine and Atmospheric Science, professor Ben Kirtman and his team use data taken from ocean sensors to create detailed climate change models that can predict how sea level rise will affect our lives in the next 20 years. In the College of Arts and Sciences, English professor Lindsay Thomas is mining the language found in at least 500,000 different articles to evaluate how people value the humanities. And in the School of Architecture, students are planning buildings and homes with data sensors to track people’s use of utilities, so consumers can be more aware of their usage. They are also looking into creating safety mechanisms, such as tiles that could alert help if an elderly person falls, said the School’s dean, Rodolphe el-Khoury.
The value of this information is also driving private companies’ need for data scientists to sift through all the information. That is why professors from a variety of disciplines have joined forces to propose a new graduate degree at UM: The Master of Science in Data Science (MSDS) with tracks in architecture, marine science, technical (computer science) and data visualization through UM’s School of Communication. The new graduate program, which is organized by IDSC and in the final stages of formal approvals, is expected to be offered this fall. In the Miami Business School, a five-year-old Master’s in Business Analytics program and a Bachelor’s of Science in Business Analytics already exists, but administrators would like to expand upon UM’s offerings by adding the new Master of Science in Data Science program through its other schools and colleges.
“With the new MSDS program, the University of Miami is looking at data science through a multidisciplinary lens,” said Jeffrey Duerk, executive vice president for academic affairs and provost. “This approach provides an immersive and exciting learning experience that gives students the opportunity to gain skills in computing methods coupled with their disciplinary focus.”
“Many companies large and small are hiring data scientists today,” Ogihara said. “This has been a new trend across the nation in the past three to four years.”
Nicholas Tsinoremas, IDSC Institute Director, said he often fields calls from businesses that want to hire data scientists. This was also clear at the third annual Big Data Conference held on campus in December, when the audience was filled with entrepreneurs hungry for data experts. IDSC also offers free workshops in computer programming called “software carpentry” to students across UM, and recent attendance has increased dramatically. This reflects the fact that more research today is drawing upon large data sets instead of traditional hands-on experiments, Tsinoremas said.
Dean Leonidas G. Bachas of the College of Arts and Sciences said he has been actively hiring faculty members who apply data science to their research because he believes that it helps fuse the talents of computer scientists and domain experts to foster new insights about the world. Bachas saw this in action a few years ago when Ogihara worked with UM Libraries to track the data on books that were requested most often by students, and helped the library reorganize their collections so that they had more copies of high-demand books, while storing those that are rarely checked out. This helped increase the efficiency of the library, Bachas said.
“Data science doesn’t belong to a single discipline. Data science is everywhere, so we need to approach it as a set of technologies that underlie a lot of what we do,” Bachas said. “This will allow us to collaborate across many different disciplines.”
Roots of the Demand
Driving the need for data science professionals is the surge in data collection. Since the 1990s, data has changed from simple databases that can be queried, to information that is constantly being gathered every time a person clicks, or touches a computer (which includes cell phones). To respond to this massive rise in data collection, the world’s capacity to store data has also increased—on a supercomputer, or in the cloud.
“Now, since data is everywhere, more data science talent could increase our ability to process many different kinds of data,” Ogihara said, adding that since data comes from sound, video, biological organisms, and customer choices, cleaning the data so it can be interpreted is often the most difficult challenge.
Even in newsrooms across the country, data scientists are needed today to help reporters navigate massive computer files to find connections for investigative stories, said Alberto Cairo, (below at Data Intersections 2018) an associate professor and expert in data visualization in the School of Communication. This prompted his school to start offering a recently revamped data journalism class, he added.
“New professional profiles have been created by the data that exists today,” Cairo said. “In the past, it was impossible to have a statistician working in the newsroom, but now at the Wall Street Journal and at The New York Times, you see that.”
Data Science Defined
So what exactly is data science? Although many students leave UM after four years able to analyze data, the new master’s program aims to bolster that knowledge so that its graduates will become well-versed in finding connections or correlations within data using strategies called “models,” said Tsinoremas. Data scientists must also have more nuanced—and often field-specific—skills than data analysts because they must decide exactly how and why to analyze the data, he added.
“They are putting down questions about what the data should answer, as opposed to just describing [the data],” Tsinoremas said. “And they are weighing the pros and cons of different data analysis or modeling techniques.”
The data science graduate program proposed at UM will involve three essential components: statistics, computer science, and a certain subject domain knowledge (marine science, computer science, architecture, or data visualization). At first, all students will take a battery of computer science courses, then they will move on to their chosen domain track and finally, they will be required to complete a three- to six-month internship or project, added Tsinoremas, along with Computer Science department chair Geoff Sutcliffe, who has also helped design the degree.
Kirtman, UM’s director of the Cooperative Institute for Marine and Atmospheric Studies (CIMAS), said that having trained data scientists could propel his field of climate modeling toward more accurate and efficient forecasts. Currently, Kirtman said, he often uses his own scientific intuition to uncover patterns in the climate and to make forecasts, but he thinks more objective mining techniques—developed by data scientists—will help scientists uncover even more information.
“They are the perfect workforce to figure out great solutions to scientific discovery of these enormous data sets that we just don’t have the capacity to do now,” said Kirtman, who is also director of Earth Systems at IDSC. “We have to do better than just going on fishing expeditions [to find patterns in the data]. We have to have technological tools that can really cut through the data in a coherent and sensible way, as opposed to the random walk we do now.”
The Evolution of Data Storage
Just about 11 years ago, much of the data storage capacity that UM professors enjoy today did not exist. Faculty who needed larger data storage for their massive computer files had to either buy space on commercial supercomputers, or go without it. But in 2007, IDSC began making it possible for professors like Kirtman and Lemmon to utilize UM’s own supercomputer called Pegasus, where they can analyze terabytes to petabytes of data. They also created a data storage facility on the Rosenstiel campus and IDSC is now updating the University’s supercomputer to meet the increased interest.
“We are developing this tool to make better forecasts from days to decades, but these forecasts are generating enormous amounts of data,” Kirtman (pictured at left) said. “If the capacity at UM had not increased, we would not be able to do that. I’m deeply indebted to those resources to really expand the kind of activities we can do. It’s part of the reason I came here.”
On the medical campus, Lemmon is also utilizing the larger computational power afforded to him by the University’s supercomputers He works to find techniques that could remedy paralysis by examining nerve fibers and using chemicals to stimulate nerve growth. If he is successful, the chemicals could be developed into drugs that may help paralyzed individuals regain some movement. Like Kirtman, Lemmon sees the value in UM producing more data scientists to advance his field, along with other fields of study.
“A very big problem for many biomedical sciences here and elsewhere is finding people to help us analyze these giant data sets,” Lemmon said. “Most people who are trained to do biological experiments are not trained to use supercomputers and analyze datasets, so we must find people in computer science and convince them to collaborate with us.”
Read More at NEWS@TheU