Put it on a flashlight app so the people frantically trying to turn the light on in the dark click over and over and over and over ….
For those keeping tabs on the elusive formula for what makes a data scientist, add “sense of humor” to the list.
Claudia Perlich’s official title at New York City-based Dstillery is chief scientist. “I like to make data useful,” she said in an interview with CMSWire. With Dstillery, Perlich designs, develops, analyzes and optimizes the machine learning which turns millions of small, individual events -- a website visit, the apps you use on a smartphone, the location of the smartphone -- to narrow down prospective customers for brands to target with online ads.
And before you get worried about the creepy data collection factor, Dstillery ensures that all data it collects is anonymized -- assigning each person it tracks a 20 digit random number. It does not store personal information and only retains IP addresses transiently. No profiles or classifications are created from the data. As Perlich put it that's "the beauty of the machine making the decisions ... to a computer a data point doesn't mean anything, it's just a data point." She went on that Dstillery is "not interested in what you're reading or what it's about. It just needs to know that you went to that page."
What Dstillery offers is not deeper insight into the customers you have -- you already know your customers (or you should) -- but a look at who you, as a brand, might want to reach. It takes the data points collected from activities taken (not a prediction of what it thinks you would do) to say, for example, "this is what makes a good Nike customer and these people are obviously not interested in running shoes." Its aim is to show things that are relevant to you, in as unobtrusive a way as possible.
From the looks of some of the clients Dstillery boasts -- Mercedes, Amex, Citi, AT&T and closer to home Adobe, HP, Microsoft -- it is succeeding. A partnership with Twitter announced last December made it one of the first companies to work with Twitter on its tailored audiences program.
How Did She Get Here?
Perlich's position at Dstillery combines the business and the technical side of her background: She holds a PhD in Information Systems from NYU's Stern School of Business and an MA in Computer Science from the University of Colorado. Perlich joined Dstillery after a 5 year stint at IBM's Watson Research Center, where her focus was on data analytics and machine learning for complex real-world domains and applications.
Perlich has been in the field for 18 years now and it's only recently that the demand and the interest has peaked. "Back then I was just a geek, now all of a sudden I have a sexy job," she said.
Curiosity, Skepticism, Intuitive, Inquisitive
Most of us have a hard time defining what a data scientist is. We're not alone. Perlich admitted that "it's kind of hard to even define what a data scientist is." She identified two fields that could fall under the umbrella: the first group has been around for around 20 years. They have data mining skills, know the algorithms, possess strong capabilities in statistics and computer science. Perlich doesn't see a shortage of people who fall in this field.
Then there's the more recent iteration. This second group falls more in the applied side, less hard core computer science, more people with backgrounds in cognitive or social fields. While Perlich agreed that people were emerging with these skills -- aided in part by new data science programs from major universities including MIT, Columbia, Berkeley, NYU, Harvard -- it isn't at the rate to handle the demands of big data. The challenge of this group (and the programs aimed at populating this group) is how to pull together the proper mix of skills and capabilities needed.
But there's another component lacking according to Perlich, one that doesn't come up much in the reports bemoaning the death of data scientists: management. For companies who are lucky enough to have a data scientist in house, or for those that work with companies such as Dstillery, a manager is needed with enough knowledge of data's potential and a really good grasp of metrics.
You need somebody in the company who has a good vision of what to do and who can identify problems that call for data-driven solutions and then turn them over and specify them for the data scientists. That communication has to work very well because otherwise you just have a bunch of techies sitting in the back room doing their stuff. That doesn't give you the benefits that you can have from the new technology."
Perlich heard from six established data scientists what they look for when they are looking to hire internally. None believed in coding tests (with one exception). Most asked candidates how they would solve a problem during the interview, looking for candidates who showed on their feet thinking and creativity in data use. The final quality was skepticism.
Data can really fool you. A lot of people have the bias 'oh, it's data, it must be true' .... Data might be true, but what you think the data point meant is actually wrong. You as a data scientist need to walk a very fine line with a skeptical open mind of what could be alternative interpretations," she said.
Final Qualification for a Data Scientist
A final adjective to include on your data scientist checklist is generous. A community of data scientists in New York City engage in pro bono work with nonprofits with worthy goals in the area, helping them reach their goals. And on August 24 to 27, 1,500 of the top data scientists will be coming together at the KDD conference, which Perlich is organizing. The focus? "Data Science for Social Good." For those who think there's a shortage of data scientists, it might be worth the trip.