“Data Scientist” is the sexiest job of the 21st century. The Harvard Business Review made this claim last October and it seems that everyone (including your grandmother) has been repeating it ever since.
But what exactly does a data scientist do and what specifically makes her/him sexy (aside from the clout and the big paycheck)? And perhaps, as importantly, what do you have to know -- and where do you get the “stuff” that you need -- to land the hottest gig of the 21st century?
To answer these questions, we’re providing you with an Infographic below (courtesy of FICO) and insights from some of the hottest companies in Big Data.
Data Scientist: Skills in Demand, Not Degrees
First off, there is no question that Data Scientists and Big Data Wranglers are in high demand. FICO recently reported that there was a 15,000% increase in job postings for data scientists between summer 2011 and summer 2012. Add to that that 1.5 million more data-literate managers will be needed by 2018, and you’ve got a lot of jobs.
But you won’t find many, if any, data scientists in the working world with “data scientist” on their diploma. In fact, some of the world’s most talked about data scientists didn’t know that they wanted to be data scientists when they began their careers, so getting that specific degree (and it’s doubtful that any formal data science programs existed at universities five years ago) probably isn’t key to being successful.
We did a quick perusal through the profiles of some of today’s leading data scientists to see what they were studying before they earned their fancy titles, here’s what we found: Jeff Hammerbacher and DJ Patil studied Mathematics; Hilary Mason studied Computer Science; Doug Cutting, co-founder of Hadoop, has an AB in linguistics and Jonathan Goldman one of LinkedIn’s earliest data scientists studied Physics.
And we’re pretty sure that few, if any, of these folks applied for a role as a “data scientist” when they got their first jobs, so if you’re not a data scientist right now, but you think that you’ve got what it takes to become one, there’s room for hope, and plenty of it.
Consider that some of the companies that employ the world’s most prominent data scientists aren’t even sure of exactly what a data scientist’s academic background should look like.
"The particular set of skills required to perform data science in practice is both still being defined and hard to accumulate through conventional curricula,” says Ryan Goldman, Services and Training, marketing manager, Cloudera. He adds that the data scientist role is relatively new and mainly populated by “people who are better at statistics than any software engineer and better at software engineering than any statistician,” a phrase he’s borrowed from Josh Wills, Cloudera’s Director of Data Science.
Lisa Arthur, Chief Marketing Officer at Teradata adds that a softer set of skills is required as well
Companies are in need of employees who can sort through enormous data sets to find valuable insights, but few people have the right mix of technical and business know-how,” she says. “While an IT degree is helpful, it ultimately comes down to a potential employee’s softer skills. Can they identify insights that their more technical colleagues may not see? Can they present data-driven findings in a way that the board room understands? These are the types of questions companies are asking, and a degree cannot necessarily equip graduates with these skills.”
And though degree programs are emerging, it’s yet to be seen as to whose programs will be best and what their worth will be.
“For now given the scarcity of talent, learning about big data technologies thoroughly and being able to apply them consistently may be enough to land a job,” says Jonathan Ellis, Co-founder and CTO of DataStax.
Training is the Ticket to Fill the Data Scientist Gap
Degree programs may be great (once they’re here), but for those with the right backgrounds, training may be the ticket.
Even if plenty of Data Scientist degree programs were available at most universities (Note: we’re not taking anything away from those who have them), they wouldn’t provide the large number of data scientists that Enterprises need now. As a result, some Big Data vendors are offering training programs, some of which require deep technical and analytical abilities as prerequisites, others may be suitable for Java junkies and relational database DBA’s.
Cloudera, which prides itself in being an early leader in developing the next generation of data scientists, offers technical training for Big Data developers, administrators and analysts working on the Apache Hadoop platform as well as an Introduction to Data Science course that focuses on building recommender systems -- the sophisticated frameworks at the core of deep-data companies like Amazon.com and Netflix. These programs are offered via Cloudera University -- be sure to read the prerequisites. It’s also worth noting that Cloudera has created the industry’s first Data Scientist certification program.
For Java or relational database DBA’s who want to venture into the world of Big Data, DataStax offers training courses designed to teach engineers everything they need to build applications on Apache Cassandra and run Cassandra in production. They also cover complementary technologies such as Hadoop and Solr. A Java Developer Training course is currently being built, it will give participants hands-on experience with the Java driver for Apache Cassandra. Students who complete this course will be able to return to their jobs and start building on Cassandra right away. Find out more via DataStax Academy.
The Road to be a Data Scientist is No Cake Walk
So while we may have made working in the world of Big Data seem like an achievable goal, be mindful that it won’t be a cake walk.
Ted Dunning, Chief Architect at MapR has a nice way of putting it:
The key problem in big data is that data has mass and inertia and big data has a lot of it. The mathematically grounded skills related to building models are important, but to me the most important skill in big data is wrangling. Just like moving heavy things requires somebody qualified as a millwright, moving big data requires the skills necessary to handle big, massive data sets with apparent ease. Once you can do that, then you can use simple algorithms to solve hard problems.”