The Death of Data Science has been Greatly Exaggerated

Editors_deskI’ve been finding it fascinating to watch on as data scientists discuss the death of data science. It all started with a rather sensationalized post over at Slashdot: “Data Science is Dead” by Miko Matsumura, Vice President at Hazelcast. It wasn’t even a question for discussion, but rather a declaration of a foregone conclusion. The subtitle for the piece was “Not only is Data Science not a science, it’s not even a good job prospect.” Well, those are fightin’ words to a data scientist like me!

The discussion that ensued over on the forum of the LinkedIn group Advanced Business Analytics, Data Mining and Predictive Modeling is becoming quite heated (and at times nasty) with “my data is bigger than your data” talk. I love when geeks battle it out. It’s better than a bum fight.

But before I get to the LinkedIn flame war going on, let’s take a look at the article that started it all. Below is the picture that accompanied the post. I’ve seen this image before in several places on the web and it disturbs me at a fundamental level. It is as-if some marketing type was asked to create a blackboard full of mathematics but he/she didn’t know any mathematics and this is what they came up with. Silly. Repugnant. But the author of the article somehow tries to give the impression that data science is just a bunch of hooey and here’s the proof on the blackboard. Very weak argument.

 

DataScience_dead

The article makes so many outlandish comments that you need to wonder if it was placed just to give Slashdot a kick in traffic. I’m sure it did just that.  Let’s take a look at some of Miko’s zingers starting with:

Now, before you drag out the pitchforks: I’m not a query hater. You won’t see me standing outside the Oracle Open World conference with a sign that says “NO SQL” on it.”

Somehow Miko equates data science with SQL queries. Huh? I guess he hasn’t heard that data science is based on mathematical statistics, probability theory, and machine learning (related to AI). And then there was this outrageous claim:

Building a query is like “forming a hypothesis,” but at that point we enter the realm of observational or “soft” science. Yes, by this standard, Astronomy and Social Sciences are also not sciences. I have no idea what Computer Science is, but no, it’s not a science either.”

I have a few friends who are members of the National Academy of Sciences (and one won the MacArthur Fellowship) who are astronomers. I don’t think they’d appreciate this characterization. Oh wait, Stephen Hawking is a cosmologist which by Miko’s definition doesn’t practice “science” either. Sorry Stephen, you’re the most brilliant scientist alive today, but you’re not doing science. And if Miko doesn’t know what computer science is then that is largely part of his problem. And then there was this gem:

If you’re going to make up a cool-sounding job title for yourself, “Data Scientist” seems to fit the bill. You can go buy a lab coat from a medical-supply surplus store and maybe some thick glasses from a costume shop. And it works! When you put “Data Scientist” on your LinkedIn profile, recruiters perk up, don’t they?”

True enough, the designation “data scientist” is new, but I, for one data scientist, am glad that I finally have a title that adequately describes what I truly do. I am not a developer, or a statistician, or a pure mathematician, or a DBA, and I hated the term “data miner” because I felt like I had to walk around with overalls and a pickaxe. Data scientist suits me very well thank you very much. And her’s my favorite non sequitur:

OK, so we want to be “Data Scientists” when we grow up, right? Wrong. Not only is Data Science not a science, it’s not even a good job prospect. In the immortal words of Admiral Akbar: “It’s a trap.”

Aside from the Star Wars reference, this comment is pretty sad. Hey! I did want to grow up to be a data scientist. When I was a kid growing up, I was very fond Asimov’s Foundation Trilogy. There was a character named Hari Seldon, a mathematics professor who developed a scientific field called “psychohistory” that combined history, sociology and mathematical statistics allowing him to predict the future in probabilistic terms. Seldon was my childhood hero, so yes Miko, I did want to be a data scientist!

There is a lot more in this anti-data science manifesto and I encourage you to read it to form your own opinions, but I warn you, if you’re a true data scientist hold your nose.

Getting back to the LinkedIn discourse, a lot of it degraded into accusations and derision mostly by a certain well-known data scientist who will remain unnamed. Here is a list of highlights that I found particularly enjoyable:

  • Here’s what happens to companies that don’t care about big data: Target CIO lost her job one month after data about 70 millions credit cards were stolen, and revenue plummeted in Q4 as clients run away. Ignore big data (and data science to handle it), and hackers will kill your business.
  • The point is that unless “data scientists” can execute and produce reliable solutions with controlled validation as is done in real science where the control predictions are valid predictions made by experts, then they will hardly be viewed as “scientists” and their methods will always be viewed as very risky because we do not know how they would do relative to what experts are able to produce.
  • The problem might not be with their “data scientists” (whatever they are called), but with the way management uses them, to produce little predictive toys rather than to get them to work on fundamental problems.
  • I would bet that Target did “proper cross-validation”, but that might be the problem because cross-validation can be very misleading about the future in environments that change rapidly and where your predictive models give no insight into what might be causing such change.
  • Saying data science is dead is saying none of this is useful anymore. But what is indeed dead, is the illusion that (1) there are people mastering ALL this stuff – actually there are, but they are rare and (2) that such people can find a job and help companies – wrong except in some modern environments.
  • The selection of the title “Data Scientist” was based on the need of many to be “taken more seriously” by upper level management. But many complain that their work is still not taken seriously by management, so the poor justification they had for calling themselves “scientists” in the first place, has not worked out.
  • It would be helpful if data “science” worked on establishing some core methods, professional groups, and research and dissemination of findings. The kinds of things all mature areas of inquiry do … If data “scientists” want to be considered “experts” at something, this would involve more work than merely changing your title.
  • Data Sciences are the accumulation of old techniques as data mining, machine learning, text mining and operational research with a large amount of marketing excited by big data phenomena.
  • In my opinion, a successful data scientist does not even need a degree or any publication – all he/she needs is sucessful stories, like “I was paid $100k to solve this problen and my client/organisation earned $1MM over the next 24 months thanks to my contribution”. And it must not be a fluke. Anything else is BS.
  • just because the “data scientists” that you know are disguised data analysts, it does not mean they all are. Scientific argumentation does not use anecdotal evidence (one data point, you) to prove a fact.
  • I, for one, am hopeful that the term ‘data science’ is on the way out… it really doesn’t mean anything, and generally tends to just open the door for debate about what should and should not be considered science (not to mention flooding of the job market with people who claim to be something they are not).
  • I have more than 25 years of experience breathing air. It makes me an air breathing expert. So what? At the end of the day, what people care about, is the value that you bring, not your years of experience, education or job title.
  • Data science is the science of producing revenue out of data. If it does not provide a return, it is not data science.

So in my estimation “data science” is not dead nor will it be dying any time soon. I prefer to think that Hari Seldon will live long into the future.

 

Sign up for the free insideBIGDATA newsletter.

Comments

  1. All of these posts are defining Data Science as just Business Intelligence. If you’re applying Data Science in academia or the public sector, the notion of ‘producing revenue out of data’ isn’t at all relevant.

Resource Links: