Data science promises us professionals who understand how to make data useful. But do we know what it takes to be a data scientist and if so, do we know how to teach those skills to others?
As a concept, data science isn’t new, but our interest in it has grown exponentially in the past several years, driven to a large extent by our need to make big data useful.
Most definitions of data science describe it as the marriage of statistics and computer science: the geek with the huge bag of data plus the green eyeshade, the calculator and the mathematics to do something meaningful with it.
Several descriptions admit that data science grew from the realization that statisticians didn’t know enough about the growing computing field and computer scientists didn’t understand how mathematical modeling and statistics worked. It seemed natural, especially in academia, that merging the two would be more valuable in making decisions.
Data Scientist: A Title in Search of a Definition?
There is plenty of activity in the academic world to describe and train data science professionals, but like making a casserole without knowing what amounts of each ingredient to use or how and when to add them, we don’t yet agree on how to make a data scientist or what constitutes the right raw material. Creating effective data science preparation may take time and experience, especially when the raw materials — those budding statistics and computer science graduates — aren’t predictable.
Indeed, the whole approach to big data, analytics and data science is so relatively new that it isn’t clear just what our new data scientists should know in order to perform at an acceptable level. The PhD option in Big Data and Data Science at the University of Washington, for example, says: “Big Data is an evolving field, whose definition is fluid, and will continue to evolve over the years. Thus, the core of our educational approach is a comprehensive interdisciplinary, multifaceted practical training program.”
Promising, but hardly mature.
Johns Hopkins on the other hand calls data science the “art of answering important questions with data,” going on to say that data science requires that the practitioner identify a relevant question, assemble the data to answer it, develop statistical models and communicate results in a usable manner … in essence, to know everything.
Georgetown University generally agrees, but has offered a Data Science and Analytics track only since the fall of 2015. We won’t know if it has built a successful program for several years at the earliest.
North Carolina schools offer various data science certificates from the schools of public health, computing and informatics, business and economics and math/statistics; each we can assume with a different primary focus but all under the title Data Science. Stanford, likewise, began its formal offering in the 2013-14 year, focusing heavily on finance and health care.
The upshot may resemble the old tale of the elephant and the five blind men. Each can identify important characteristics but none have a complete picture of the subject. In short, there just isn’t much agreement on what data science is and what it takes to be good at it.
So What Now?
How do we deal with the array of “new” resources promised by the rise of data science and its practitioners? Here are a few thoughts”
- Until further notice, view data science and its practitioners the same way you have viewed big data analytics in the past. A good data scientist will perform virtually the same as a mature and thoughtful statistician with backup in the computing area.
- The industry, as it usually does, will hype the new titles as your “must have” resources. New titles shouldn’t change your strategy.
- Don’t pay a premium for the title Data Scientist unless the candidate or vendor can demonstrate results and a level of maturity that justifies your interest.
- Don’t assume that because there is a new discipline — Data Science — on the field, answers to your marketing and merchandising questions will now be found in your big data where they weren’t previously. While it’s true that a closer look at data can sometimes unearth answers that were missed before, often the answers just aren’t there and upping the level of analysis — and the title of the practitioner — will produce nothing new.
- When evaluating a Data Science hire or service, focus heavily on the ability to design good analyses and data modeling, and less on the elegance of the process itself. Analysis based on well designed data models often produces good and actionable answers, but poor design, while it produces results, usually chases its tail or points in the wrong direction.
- If you are considering hiring a newly minted “data scientist” from a major university, get the school’s catalog and find out what the curriculum included.
Data science, given time to mature, may provide the industry a powerful new way of supporting good decision-making. The trick will be to separate substance from title and to ask no more of the new discipline or its practitioners than they are ready to deliver.