‘Holistic’ Data Science Needed to Achieve Scientific Breakthroughs

David Blei

Born from a marriage of statistics and computer science, data science is used widely today in government, business and technology. But for data science to be effectively used in different kinds of scientific research, data scientists must take a holistic view that fuses statistical and computational skills with human judgment, argues Data Science Institute member and Columbia professor David Blei in the Aug. 7 issue of the Proceedings of the National Academy of Sciences. Blei co-authored the paper with University of California at Irvine professor Padhraic Smyth.

In modern research, many scientists cannot fully take advantage of their new data. Scientists face computational challenges — such as how to negotiate massive data sets and complex metadata — as well as statistical challenges — how to interpret the rich interactions of related variables and the rigors of high-dimensional statistics. Other problems, moreover, involve human judgment and interpretation. It’s no easy task for scientists to select the right model for their research, or to identify causality from empirical data, or to meet the semi-daunting scientific goals of data exploration and interpretation.  These challenges offer data scientists the opportunity to “catalyze” or redefine their field.

And to help scientists address these problems, data scientists must holistically “integrate research that crosses the statistical, computational and human boundaries.” Blei, a member of the Data Science Institute at Columbia University with appointments in Columbia Engineering and the Faculty of Arts and Sciences, elaborates further on this holistic view in a recent Q&A with Inside Big Data, a publication of The Association of Computing Machinery.

Data scientists must also master tasks beyond their traditional training. In this sense, data science must be more than high-dimensional statistics merged with computational thinking. Rather, the data scientist must weave those disciplines into a larger conceptual and ethical framework.

“Holistic data science,” the authors offer in their summation, “requires that we understand the context of data, appreciate the responsibilities involved in using private and public data, and clearly communicate what a dataset can and cannot tell us about the world.”

Jeannette Wing, Avanessians Director of the Data Science Institute at Columbia University, says that Professor Blei’s article is a clear, emphatic argument for how data science can better serve scientists. “Adding the human element to the marriage of statistical thinking and computational thinking,” adds Wing, “helps define the emerging field of data science, thus framing how scientists can use data science to solve vexing societal problems.”

— By Robert Florida, Data Science Institute

About rfowler