Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Business of Big Data”

Creating Tomorrow’s Big Data Workforce

In an economy where jobs are still scarce, it’s good to hear that the demand for skilled workers in Big Data is on the rise.

Nationwide, hiring slowed significantly in March with employers adding only 88,000 jobs, down from an average of 220,000 from November through February, according to a story in U.S. News. But in certain sectors, there are jobs that are going unfulfilled.

Big Data is one of those sectors.  According to Cloudera, as Hadoop continues to make inroads into the enterprise, there is a rapidly growing need for skilled data workers.  So, the company has decided to help develop this talent pool starting at the source – in the educational system.

This week Cloudera launched the Cloudera Academic Partnership (CAP), a program to equip leading universities around the world with the curriculum and training to offer Big Data courses for engineering and analytics students.

There is no question that demand for qualified Big Data professionals is increasing rapidly, while a shortage of trained workers is creating a major skills gap in the marketplace. The next generation of developers, administrators, and analysts can become the first to include new platforms like Hadoop alongside traditional databases and business intelligence tools,” said Ben Woo, managing director at Neuralytix Inc. “However, colleges and universities have historically not had the necessary resources to include these advance data technologies in their curricula, with the burden falling on employers to find existing certified professionals among the short market supply or retrain their employees to keep pace with technology. It is intelligent of Cloudera to foster development of this future workforce early and at the source, and it is a great service to professors and students around the world. The Cloudera Academic Partnership is the first of its kind, and the company is aggressively leading the charge in Hadoop education and innovation.”

The CAP program provides a number of benefits, including allowing teachers and students affiliated with CAP program institutions to freely download Cloudera training materials; deep discounts on other training materials developed by Cloudera University; and access to a variety of support services such as classroom tools, instructor forums, and the world’s largest Hadoop knowledge base.

CAP’s seven charter members include:

  • Auburn University (Alabama)
  • California State University, Los Angeles (California)
  • Harvard University: Dana-Farber/Harvard Cancer Center (Massachusetts)
  • Purdue University (Indiana)
  • San Jose State University (California)
  • Technische Universität Berlin (Germany)
  • The University of Stavanger (Norway)

Educational institutions interested in the program can request an application. The CAP program also has more information online.

Read the Full Story.


Also posted in Jobs | Leave a comment

Job of the Week: Data Analyst at PC Doctor

PC Doctor in Reno is seeking a Data Analyst in our Job of the Week.

The Data Analyst will have the following responsibilities:

  • Provide strategic data insights, including extensive data exploration, abstraction, visualization, and interpretation of trends and patterns complex data sets.
  • Glean insights from and distribute relevant industry data, communicating informed conclusions and recommendations
  • Contribute to development of analytics, data and metrics products
  • Transform data into actionable datasets to help customers reduce failure rates and support costs and increase revenue

Are you paying too much for your job ads? Not only do we offer ads for a fraction of what the other guys charge, our inside-BigData Job Board is powered by SimplyHIred, the world’s largest job search engine.


Also posted in Jobs | Leave a comment

Big Data and the Evolving Internet

Big Data is exposing glaring weaknesses in today’s Internet. And this, in turn, is driving the next stage of network evolution, much as the introduction of Ethernet and the Internet Protocol did back in the 1970s when IBM’s System Network Architecture was the only game in town.

This according to a story in Computerworld written by Stephen Lawson, based on a presentation by David Lambert, president and CEO of Internet2. Lambert spoke at the Open Networking Summit conference held this week in Santa Clara, CA.

Lambert pointed out that the current Internet began as a tool to help researchers from many different locations share data and insights.  It still fulfills that function, but in this new era of Big Data brought on by rapid advances in computing and storage, current Internet technology can’t support the massive amounts of data that engineers and scientists work with on a daily basis. Specifically, Lambert said that the technology used on the Internet today isn’t flexible enough to support new requirements, such as large file transfers, massive data sets, and content caching and distribution.

Software-defined networking (SDN) is one of the technologies underpinning the next steps in the evolution of the Internet. Lambert commented that SDN in universities today resembles the early Internet decades ago; practitioners in such data intensive fields as genomics need a new, flexible, open networking approach to advance their research.

Internet2 runs a nationwide network linking research institutions, and it’s already using elements of SDN on its production infrastructure,” reports Lawson. “SDN, a closely watched set of technologies at various stages of development, is intended to shift the control of networks from specialized devices such as switches and routers to software that can run on standard computing platforms and be virtualized. It promises a range of benefits that could include lower costs, faster service deployment and more network innovation.”

Internet2 is running a production pilot for SDN and a new high-speed backbone to provide users with the bandwidth needed to handle Big Data.  The pilot includes OpenFlow-enabled routers on a 100 Gigabit Ethernet network.  29 major universities have committed to deploying the 100 GbE network and using Internet2’s OpenFlow-based services.

The thing that excites me most about the development of OpenFlow and SDN … is the opportunity to have a network stack that’s open again, that people can actually get their hands on, and use it and do disruptive things,” Lambert said.

Read the Full Story.


Also posted in I/O, Software, Video | Leave a comment

ScaleOut hServer Opens Up Hadoop Analysis for Live Data

Today ScaleOut announced their new hServer, the first in a series of products from ScaleOut Software that is filling in the need for real-time analytics with Hadoop.

While it’s a powerful platform for analyzing large, static data sets, Hadoop has always been limited by its inability to perform analytics on live data,” said Bill Bain, ScaleOut Software CEO. “There is an increasing drumbeat for real-time analytics using Hadoop, and we’re excited to take an important step towards meeting that need with this release.”

ScaleOut hServer will be available in both a free community edition and in several commercial editions. The community edition enables up to a four-server combined Hadoop/hServer grid for analyzing memory-based data sets of up to 256GB. Read the Full Story or check out our podcast interview with Bill Bain.


Also posted in Analytics, Podcasts | Leave a comment

Taking the Big Data Journey

Big Data gets mixed reviews in a recent study conducted by the CMO Council.

Lisa Arthur, writing in Forbes, says the study indicates that CMOs and CIOs both agree that Big Data will be a key competitive differentiator and play a major role in implementing a more customer-centric business culture.

However, the study also shows they view Big Data as a problem. “In fact, 52 percent of marketers and 45 percent of IT professionals believe functional silos block aggregation of data from across the organization, making it difficult to truly achieve customer centricity,” writes Arthur.

But according to the CMO Council press release announcing the study, the two functions are more aligned than at odds with each other.

The study also reveals that the two functions are more aligned than many may think,” the Council states. “Some 41 percent of marketers and 39 percent of IT executives say they are aligned with one another, but both admit there are still challenges to executing priority projects. Regardless of any setbacks, both overwhelmingly agree that the relationship is critical, according to 85 percent of marketers and 85 percent of IT executives. Multiple areas of opportunity emerge where further alignment and collaboration will improve the organization’s ability to execute customer-centric programs. Marketing is looking for a strategic partner that will come to the table with initiatives to advance customer centricity (26 percent). Marketing also believes the greatest value in the relationship can manifest in the ability of the organization to better gather data from across the enterprise (63 percent). For their part, IT executives see marketing as their partner in advancing analytics and data-driven decision making throughout the organization (62 percent). However, IT would like marketing to approach them earlier in the process to collaborate more on strategy (62 percent) and not just platform selection and deployment.”

In her article, Arthur asks CMOs, “Isn’t it time for those silos to come down? Aren’t you ready to develop a strategic partnership with the CIO?” She then goes on to list a number of steps CMOs and CIOs can take to more fully realize the advantages of Big Data.

In today’s hectic business world, it’s nice to find her advising CMOs that because integrating marketing and data is a “BIG” job, they need to pace themselves. Counsels Arthur, “Blending data, customer interactions and processes across organizations is not a project . . . It’s a journey.”

Read the Full Story.


Leave a comment

As Demand Keeps On Increasing, Hadoop And NoSQL Skills Pay Off

In this special guest feature, Matt Asay from 10gen writes that the move to Big Data means plenty of IT jobs for people with the right skills.

The rise of Big Data has pushed companies to desperately start seeking IT professionals that can help in their database efforts. Though the Big Data market is still within the early adopter phase, major corporations beyond the .com industry are becoming more aware of the benefits of Big Data and the many software vendors that support it. According to Wikibon, a technology and research open source community, the Big Data market is estimated to reach $18.1 billion in 2013, an annual growth of 61%.

The demand for database management skills has expanded beyond Web or software companies and into industries such as retail, hospitals, and even government. These industries are seeking individuals with skills in managing and analyzing large data sets. And when it comes to the most desired skills, it should come as no surprise that NoSQL and Hadoop knowledge is highly desired.

Looking at the numbers

Not only are these skills in high demand, but companies are also willing to pay highly competitive salaries to get their hands on people with experience. According to a recent survey conducted by Dice, a site that specializes in IT jobs, salaries for employees who use Hadoop and NoSQL are over $100,000. This is significantly higher than the overall average IT salary of $85,619.

Big data related jobs in general have a higher than average salary. The average salary for big data related jobs is over $113,000. With large companies such as Amazon, Apple, Dreamworks, Nokia, and more looking for big data experts, it’s no wonder the average salary is so high.

For professionals interested in NoSQL databases, MongoDB skills top the list. As of the time of this writing, Dice shows the following number of jobs available for each of these popular NoSQL databases.

  • MongoDB: 635
  • Cassandra: 430
  • HBase: 320
  • Redis: 208
  • CouchDB: 93

While MongoDB NoSQL has been at the top since around 2011, HBase is seeing great growth and is definitely on the rise. Openings requiring Hadoop skills are currently at 1,227 which outnumber the umbrella of NoSQL (1003).

These statistics are a strong indication that companies are in need of database management skills. This can be directly attributed to the growth and popularity of big data. Perhaps one of the best aspects of careers in data management is that they are recession-proof and are unlikely to decline.

The fact is that we depend on data and unless there is some technological revolution that eliminates the need for data store, analytics, software development, and all of the other tasks that require databases, job security will not be an issue.

Careers that require NoSQL and Hadoop skills

There is no shortage of uses for NoSQL and Hadoop in the business sector. Here are some of the specific careers that these skills are being put to use for:

  • DBA: With an average salary of $81,000 according to Indeed, database administrators are in high demand right now. Companies hiring for DBA’s are looking for experience in handling a variety of database platforms such as MongoDB, Cassandra, and Oracle to name a few. The more experience you have, the more you can expect to earn. Senior DBA’s make an average of over $100,000.
  • Data Architect: This position nets an average salary of $107,000. Data architects are required to have some experience in creating data models, data warehousing, analyzing data, and data migration. Experience as a DBA and with Hadoop is also highly desired.
  • Data Scientist: Data science is a position that encompasses a variety of data driven skills. Data scientists gather data, analyze it, present the data visually, and use the data to make predictions/forecasts. The average salary for a data scientist is $104,000. Data scientists are currently in high demand and the demand will likely continue to increase.
  • Systems Engineer: The position of a “Systems Engineer” is fairly broad and typically branches off to several other positions such as software development, data warehousing, and even some DBA work. The average salary for systems engineers is $89,000.
  • Software/Application Developer: One of the more popular careers for people with NoSQL and Hadoop skills is software development. People with these skills can get ample freelance work or can launch their own startup if they have the entrepreneurial spirit. In addition to database management experience, you will also need programming skills. Software developers make an average salary of $107,000 and application developers average $93,000.

I should also note that many companies are still looking for people with relational database experience. But the growth of document oriented type databases has ignited the spark for NoSQL positions. The demand for people with Hadoop and NoSQL skills is not “on the rise”. It is here right now and IT professionals are highly encouraged to take advantage.

About the Author:

Matt Asay is Vice President of Corporate Strategy at 10gen, the company behind MongoDB NoSQL database. With more than a decade spent in open source, Matt is a recognized open source advocate and board member emeritus of the Open Source Initiative (OSI).


Also posted in Jobs, MongoDB, Software | 2 Comments

Big Data and Zeno’s Conundrum

You remember Zeno’s famous Dichotomy paradox. Roughly paraphrased it says: if you want to walk, for example, from your chair to the door, first you have to walk halfway there. And before you walk halfway there, you have to walk a quarter of the way, and so on ad infinitum. You can never reach your destination. Of course, after all that you simply stand up and walk to the door.

Could it be that the act of merely contemplating implementing Big Data in an organization can brings on an attack of Zeno’s conundrum?

Writing in Forbes, Matt Ariker, McKinsey & Company COO of the company’s Consumer Marketing Analytics Center notes that “Many who make the Big Data journey are overly fixated on making it to the ‘Promised Land.’ In far too many cases I see people who plan to build out a complete system and architecture before using a single insight or building even one predictive model to accelerate revenue growth.”

He recommends first clearly defining your end game – knowing where you are going before you start the journey. Then the trick is to avoid the fallacy of believing that you can’t gain any insights until the entire Big Data infrastructure is in place. Says Ariker, “Think of it more like a tool factory making all sorts of specific tools for specific jobs. Every time one tool is complete you can begin using it; you don’t have to wait for all tools to be complete.”

The consultant then offers four helpful pieces of advice that he uses to counsel his clients. These include: assigning the creation of an Insights Roadmap to a highly skilled team with the relevant skills; creating a detailed business case that focuses on P&L impact; and starting small by cherry picking already available data sets. Here’s the fourth and final step.

Build a roadmap. I know this step sounds obvious but I’m constantly surprised by how little effective planning takes place,” comments Ariker. “As counterintuitive as it may sound, it takes planning to work fast. Make sure you’re focusing on the models and necessary infrastructure that deliver the most value, and sequencing them all to minimize delays. Put in place clear data delivery deadlines and test timelines for each model to ensure timely delivery… One high tech company followed this “Insights Roadmap” route. By following the four steps I laid out above, the company started generating insights within two months of the project starting, long before the two-year Big Data warehouse build project was completed. Within three months, the company was able to book $5m in additional revenue. By end of year one, incremental revenue exceeded $75m. Your journey to reach Big Data’s promised land will take time. But realizing value doesn’t have to.”

Read the Full Story.


Leave a comment

Big Data and the Humanities

Last week Columbia University held a daylong symposium titled “From Big Data to Big Ideas,” as part of the launch of its Institute for Data Science and Engineering. Despite the emphasis on technology that the Institute’s name implies, several of the participants brought up the need to include the humanities in this new world of Big Data that we as a society are embracing.

Steve Lohr of the New York Times was there. In his Sunday blog, “The Potential and the Risks of Data Science,” he says that during presentations by Columbian professors and computer scientists from various companies, including Google, Facebook, Microsoft, and Bloomberg, issues around the misuse of Big Data – such as privacy and surveillance – were mentioned only in passing.

Lohr reports however, that concerns were expressed by a panelist from a company that has pushed the limits of Big Data collection and use: Google.

My concern is that the technology is way ahead of society,” said Ben Fried, Google’s chief information officer. There is danger, he suggested, if only a technical elite understand Big Data and its implications, with the risk of a runaway technology or a public rejection. “I think it is a mistake if conversations about this technology leave out the humanities,” he said. Broader social concerns, he explained, should be a guide and will affect the spread and use of Big Data technology.”

Evidently one of the Columbia professors is planning to make the humanities part of Big Data. Mark Hansen, director of the Institute’s New Media Center, is teaching his students from the University’s Graduate School of Journalism how to do some programming and understand the algorithms underlying Big Data. “Software algorithms, he (Hansen) said, are not impartial,” writes Lohr. “They are written by people, and can embody human values and biases.”

In their book, Big Data, Viktor Mayer-Schönberger and Kenneth Cukier, devote several chapters to the perils of Big Data misuse and possible solutions. What they envision goes far beyond training journalists – whom Hansen calls “society’s explainers of last resort.”

The authors call for the creation of a new professional, what they call the “algorithmist.” These algorithmists would be experts in computer science, mathematics, and statistics and would act as reviewers of Big Data analyses and predictions.

And, given the tenor of the discussion at the Columbia symposium, it probably wouldn’t hurt if these new Big Data watchdogs had some training that included a smattering of the humanities – such as readings in the philosophy of the Enlightenment, the history of the Roman Empire, or what the Lake Poets of the 19th century were all about.

“Getting and spending we lay waste our powers,” cautioned William Wordsworth. Take heed.


Also posted in Education | Leave a comment

Ed Dumbill on Picking Winners In Big Data

Over at Forbes, Edd Dumbill writes that as a flurry of Startups and incumbents scramble for a piece of the Big Data pie, the key to success is understanding that Big Data is a business problem.

Big data is ultimately about the smart use of data to drive a business. There are two kinds of data: data about your business, and data external to your business that you can create value from. It’s easy to see who might get success from the latter, external data. We can expect that massive data owners such as Google, Facebook, Thomson Reuters, Bloomberg will experience ongoing success for as long as they are able to create product from their data. Who owns your data, though? The obvious answer, you, isn’t the only answer. In fact, your data is locked up inside the platform choices you make, at both the hardware and software level. If your systems are based on Oracle, Microsoft, you are very unlikely to move in a hurry. Data likes to stay where it is, and tends to attract more data as you build systems around it. Production systems are expensive to replace.

Read the Full Story.


Leave a comment

Free Ebook: NAS Optimization for Dummies

Our friends at Avere are offering a free copy of NAS Optimization for Dummies.

Big NAS performance comes from your ability to scale, eliminate sources of latency, and gain the advantages of the cloud. Get started with Avere Systems’ Special Edition of NAS Optimization for Dummies by Allen G. Taylor.

In this book, you’ll find:

  • How to configure NAS storage for optimal performance
  • Ways to reduce the cost of upgrades as your storage needs grow
  • How to minimize the impact of multiple users hitting the storage systems at the same time

Read the Full Story.


Also posted in Book Review, I/O, Sponsored Post, Storage | Leave a comment

Advertisement

ClusterStor Ad

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap