It’s great to see a mainstream publication such as the Wall Street Journal covering an upstart tech industry like Big Data as it provides legitimacy and exposure in corporate circles. I was therefore energized to see a recent article “The Corporate Downside of Big Data” by Dr. John Jordan, professor at Penn State University. Unfortunately, the article left the reader with a number of false impressions and questionable conclusions. I’d like to take a few excerpts from the article (link provided above) and make some of my own counter observations. Please feel free to leave comments on both sides of the fence!
Getting people qualified to work in such data-analytical tools as Hive, Pig, Cassandra, MongoDB or Hadoop is only the first layer of this onion. Few companies have in-house experts who can even make a business case to justify the cost of hiring big-data experts, let alone assess the quality of the applicants. Many managers also lack basic numeracy, so getting decision makers who can grasp more sophisticated statistical mechanics can be a challenge.
Counterpoint: While I agree that hiring qualified data scientists to fill an ever-increasing number of positions is becoming a challenge, I take exception to the other purported layers of the so-called onion. I believe most companies have staff who can make an effective business case to justify in-house big data talent whether it be a full-time position or consulting role. Further, all it takes is one qualified data scientist consultant on staff to then take on the responsibility of assessing the quality of further applicants. Over time, the company will acquire in-house expertise. And it is the responsibility of the data science staff to effectively communicate statistical results with managers in such a way so as to tell a compelling story of actionable business knowledge – managers need not have an advanced degree in statistics. The author’s assessment of the abilities of corporate America is not very flattering.
Complicating the matter, big-data tools aren’t ready for prime time: They are evolving rapidly, aren’t taught in most universities, have less-than-ideal vendor support and require levels of user flexibility that more mature tools don’t.
Counterpoint: It is a very bold statement to say that big data tools are not ready for prime time. Tell that to Google, Yahoo, Facebook, LinkedIn and non-tech giants like GE, Ford, MetLife that all rely on big data tools for the survival of their companies as well as having well-popularized and highly successful big data deployments. The fact that Hadoop isn’t taught at the university level is not a limiting factor at all; the darling of corporate databases Oracle is seldom included in most CS degree programs. I can’t really comment about the less-than-ideal vendor support because I haven’t seen that to be true. In my experience, big data vendors are quite eager to please.
Here’s another layer of the onion: For big data to be useful, programmers and analysts also must understand the basics of the industry they are programming for.
Counterpoint: This onion layer is nothing new in IT circles. It is not just big data related staff that needs to acquire domain experience in the industry of their employer. It may be true that a company in the pharmaceutical industry may hold out to hire a data scientist with this specific domain experience, but I believe we data scientists are intelligent enough to adequately acquire this knowledge by extended interaction with in-house domain experts. I don’t concur that your average MongoDB developer needs to have extensive domain experience upon hiring in to a big pharma company, this experience can be acquired just like any other job experience.
One final layer is IT security. If it’s true that many companies don’t have the skills to work with big-data tools, they certainly don’t yet have the skills to keep that data secure. As more information is gathered, that’s more information that can be leaked or stolen.
Counterpoint: This statement gives the impression of a playground full of newbies who don’t know enough to keep a lock on their valuable corporate data assets – just not true. I think any company with reasonable data governance policies in place will have security mechanisms designed to protect their large data stores. Big data or small data, you protect your data all the same.
I encourage you to read the balance of the WSJ article which in my opinion, uses reactionary tales of the big data boogeyman coming to your data center to disturb corporate complacency. The article contains many more objectionable perspectives of the spreadsheet mentality on which corporate American is supposedly stuck. On balance, the article does include some valuable results from the June 2013 Gartner online survey which gives good insight into how big data is progressing, albeit the survey results seem t0 contradict the view in the WSJ article.
Daniel – Manage Editor, insideBIGDATA