Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Business of Big Data”

HP Targets Big Data as Part of its Converged Infrastructure Push

HP is hopping on the Big Data bandwagon.

On Monday the company announced a new organization, HP Converged Systems, cobbled together from various dedicated resources “…to deliver purpose-built technology for social, cloud, mobile and big data solutions.

The business unit is charged with helping accelerate the next generation of the “Converged Infrastructure concept.”  Announced last February at its yearly Global Partner Conference, HP’s Converged Infrastructure portfolio includes the latest in its blade systems, as well as converged storage, a mixed bag of networking solutions, and HP services.

According to the press release, the HP Converged Systems business unit will extend the portfolio of converged application appliances that fuse infrastructure, applications into a single system.  Included are appliance systems for Hadoop, HP Vertica, SAP HANA, and HP CloudSystem.

HP continues to be at the forefront of the data center evolution, accelerating the pace of innovation for our customers,” said Dave Donatelli, executive vice president and general manager, HP Enterprise Group. “HP was the first to announce Converged Infrastructure, which each major technology company has since followed. Today’s organizational updates are the next logical step as we accelerate the delivery of game-changing converged systems technology.”

Although HP is quick to put its stamp on the Converged Infrastructure concept, the idea has been around for a while.  Wikipedia offers a succinct definition: “A converged infrastructure addresses the problem of siloed architectures and IT sprawl by pooling and sharing IT resources. Rather than dedicating a set of resources to a particular computing technology, application or line of business, converged infrastructures creates a pool of virtualized server, storage and networking capacity that is shared by multiple applications and lines of business.”

The Wikipedia entry goes on to say, “In April 2012, the open source analyst firm Wikibon released the first market forecast for Converged Infrastructure, with a projected $402B total available market (TAM) by 2017 of which, nearly 2/3rds of the infrastructure that supports enterprise applications will be packaged in some type of converged solution by 2017.”

Not the most elegant sentence in the world, but one that indicates that this is a big and growing market.

Wikibon also notes in another posting, “The total Big Data market reached $11.4 billion in 2012, ahead of Wikibon’s 2011 forecast. The Big Data market is projected to reach $18.1 billion in 2013, an annual growth of 61%. This puts it on pace to exceed $47 billion by 2017. That translates to a 31% compound annual growth rate over the five year period 2012-2017.”

It’s no wonder HP is setting its sights on the Converged Infrastructure and Big Data marketplaces – the combination is irresistible.

Read the Full Story.


Also posted in Hardware | Leave a comment

Algorithm Predicts Whether Startups Will Succeed

Over at Oregon Business, Linda Baker writes that Thomas Thurston from Growth Science has created a model that accurately predicts whether or not a Startup will succeed.

How does Thurston’s model work? It’s rooted in the mountains of data he has collected on market and corporate dynamics, including the anticipation of future changes in the marketplace. Patterns of success or failure then emerge depending on these different market and business behavior factors. “The key is identifying variables that are predictive of success and failure,” says Thurston, who is very hush-hush about revealing those variables. It’s a process that involves “lots of hard, hard work,” he says. “You go through a whole haystack to find one needle.”

Read the Full Story.


Also posted in Analytics, Startups | Leave a comment

Here’s to Your Health with Big Data

Electronic health records (EHRs) were supposed to revolutionize healthcare, saving up to up to $81 billion a year through innovative new efficiencies and the collection of massive amounts of data that could be used to help prevent as well as cure diseases.

Well, it hasn’t happened.  Writing in Computerworld, Lucas Mearian reports that EHRs have become more of a hindrance than a help.

He quotes Dr. Robert Walker, director of health innovation for the U.S. Army Surgeon General, who said in an interview, “The electronic medical record has become an impediment versus something that was going to streamline your day. It took the focus away from the patient and put it all on the computer. People are clicking boxes and turning their backs to patients. It’s all about jamming data into this thing.”

But despite EHR’s shortcomings, the fact is that the program is gathering great quantities of invaluable clinical data and storing it in data warehouses. Researchers can access and analyze this data using powerful Big Data engines like Hadoop.

That’s the real renaissance that’s going to happen in health care,” Walker said. “With big data, what happens in a doctor’s office is going to be vastly different from what we see today. The top five or 10 things that people die from in America are life-style induced. That’s absurd. Maybe instead of vital signs, I’m just going to look at what you buy in a grocery store.”

Mearian cites several areas that are already reflecting the promise of improved health care with the help of Big Data analytics.  For example, advanced drug therapies are being developed through the study of genomics – a.k.a. personalized medicine.  Or there is the free open source software called i2b2 informatics that has been developed by Dr. Isaac S Kohane, a professor of pediatrics and health sciences technology at Harvard Medical School & Children’s Hospital. The software is being used by more than 100 academic health centers around the world to identify genetic predictors for diseases and harmful drugs.

Dr. Walker believes the real game changer in medicine will be an engaged patient, one who will enter his or her own data through the use of mobile devices,” Mearian reports. “And that data can include not just medical information, but also lifestyle updates involving diet and exercise. By having a full picture of a patient’s lifestyle, doctors are better equipped to help patients avoid the onset of chronic illnesses. Then, once the data is in an EHR, big data analytics engines could offer physicians information about patients who may need to adjust their caloric intake, level of activity or the amount of sleep they get.

Walker comments: “The answer to the obesity problem is not the operating table, but the dinner table, and that’s where we need to get to. In this country, we’re putting billions of dollars into healthcare and our life expectancies are less than in countries that spend a fraction of what we do. We’re really doing disease care and not healthcare today.”

Read the Full Story.


Also posted in Healthcare | Leave a comment

Cutting Big Data Down to Size

Rufus Pollock

It’s always a pleasure to run across a well thought-out contrarian point of view, and Dr. Rufus Pollock provides just that in a recent blog post entitled “Forget Big Data, Small Data is the Real Revolution.”

Pollock is the founder and co-director of the Open Knowledge Foundation headquartered in Cambridge England. He casts a cold eye on all the feverish activity promoting Big Data, including Big Data week, which is currently underway (see the earlier Inside-Big Data story).

But the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined.”

He points out that the real revolution is the “mass democratization” of the means of accessing, storing and processing data.  This allows us to tap into a distributed ecosystem made up of small data.  Size is not what matters – the point is having the right data at hand that’s needed to deal with whatever issues we might be facing at the time.

For many problems and questions, small data in itself is enough. The data on my household energy use, the times of local buses, government spending – these are all small data,” Pollock writes. “Everything processed in Excel is small data. When Hans Rosling shows us how to understand our world through population change or literacy he’s doing it with small data. And when we want to scale up the way to do that is through componentized small data: by creating and integrating small data “packages” not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos. This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.”

Read the Full Story.


Leave a comment

Sage Weil Presents: An Intro to Ceph for HPC

In this video from the Lustre User Group 2013 conference, Sage Weil from Inktank presents: An Intro to Ceph for HPC.

Ceph is a free software unified storage platform designed to present object, block, and file storage from a single distributed cluster. Ceph’s main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is seamlessly replicated, making it fault tolerant. Ceph is a software-based solution and runs on commodity hardware. The system is designed to be both self-healing and self-managing and strives to reduce both administrator and budget overhead.

Check out more presentations at our LUG 2013 Video Gallery.


Also posted in Ceph, HPC, Lustre, Software, Video | Leave a comment

Eadline: Is Hadoop the New HPC?

Over at Admin HPC, Douglas Eadline writes that Hadoop could be the new corporate HPC for Big Data.

The growth of Hadoop and the hardware on which it runs has been increasing. Certainly it can be seen as a subset of HPC, offering a single yet powerful algorithm that has been optimized for a large number of commodity servers, with some crossover even into technical computing that could see further growth as things like YARN begin to give existing Hadoop clusters more HPC capabilities. Many companies are finding Hadoop to be the new Corporate HPC for big data.

Read the Full Story.


Also posted in Hadoop, HPC | Leave a comment

Trifecta of Big Data, Analytics, Cloud Responsible for Enterprise Software Growth According to IDC

This week IDC released the latest results from its Worldwide Semiannual Software Tracker. Despite only modest gains last year in the worldwide software market, certain specific areas showed strong growth.  According to IDC, the management and leveraging of information for competitive advantage is driving gains in markets associate with Big Data and analytics.

For 2012, the worldwide software market grew 3.6% year over year reaching a total market size of $342 billion.  This was less than half the growth rate experience in 2010 and 2011 and is indicative of a more conservative growth period.

But, says IDC, despite the slowdown, there are faster growing market segments such as Data Access, Analysis and Delivery, Collaborative Applications, CRM Applications, Security Software, and System and Network Management Software. Every one of these markets grew in the 6-7% range, about double the rate for enterprise software as a whole.

The global software market, comprised of a multi-layered collection of technologies and solutions, is growing more slowly in this period of economic uncertainty,” said Henry D. Morris, Senior Vice President for Worldwide Software, Services and Executive Advisory Research. “Yet there is strong growth in selective areas. The management and leveraging of information for competitive advantage is driving growth in markets associated with Big Data and analytics. Similarly, rapid growth in cloud deployments is fueling growth in application areas associated with social business and customer experience. Both these initiatives require a reliable and secure infrastructure, driving investments in security and system/network management. The combination of these forces is advancing the growth to what IDC has termed the third platform.”

IDC identifies Application Development & Deployment (AD&D) as one of the three primary segments making up the total software market. AD&D was the fastest growing segment, comprising nearly 24% of software revenues in 2012 and growing at a rate of 4.6%.

Business Intelligence and relational database management systems are fueling this growth because of the growing adoption of Big Data and analytics.  IDC goes on to say that “Big data and analytics are also closely tied to the fast growth social business software markets, where the combination of contextual data and the ‘right’ expertise is becoming critical for supporting enterprise decision making and data driven customer experience solutions. Oracle continued to lead the AD&D segment with steady market share of 21.6%, followed by IBM, Microsoft, SAP, and SAS. Among these vendors, Microsoft and SAP stood out by each gaining almost a half point of market share year over year.”

Read the Full Story.


Also posted in Software | Leave a comment

Big Data Upending H.R. Conventional Wisdom

For decades human resource management has been a field that relied on policies and procedures, leavened by healthy doses of intuition. But Big Data is standing H.R. on its head – many of the old assumptions are either being called into question or totally discarded.

Writing in The New York Times, Steve Lohr chronicles a number of these cherished beliefs that are being undermined by the results of recent research based on Big Data.

For example, a good supervisor – one who is an excellent communicator with a warm personality – may be more important to an organization’s success than the experience and attributes of the workers themselves.

And when it comes to hiring, data also shows that the tendency of H.R. departments to avoid candidates with a history of job-hopping or who have been unemployed for some time is the wrong tack to take. These factors are not good predictors of future results.

Even the idea that the ideal salesperson’s most important asset is an outgoing, optimistic personality fails to hold up.  Research by IBM’s Kenexa unit – a recruiting, hiring and training company acquired last year – reveals that successful salesmen exhibit “a kind of emotional courage, a persistence to keep going even after initially being told no.”

You would expect Google to be at the forefront of using Big Data to manage its H.R. activities.  And, indeed, this is the case.

Google, not surprisingly, is committed to applying data-driven decision-making to human resource management,” writes Lohr. “For years, candidates were screened according to SAT scores and college grade-point averages, metrics favored by its founders. But numbers and grades alone did not prove to spell success at Google and are no longer used as important hiring criteria, says Prasad Setty, vice president for people analytics. Since 2007, the company has conducted extensive surveys of its work force. Google has found that the most innovative workers — also the ‘happiest,’ by its definition — are those who have a strong sense of mission about their work and who also feel that they have much personal autonomy. ‘Our people decisions are no less important than our product decisions,’ Mr. Setty says. ‘And we’re trying to apply the same rigor to the people side as to the engineering side.’”

The impact of Big Data on H.R. is expected to be profound.  Lohr quotes Peter Capelli, director of the Center for Human Resources at the Wharton School of the University of Pennsylvania who says, “This is absolutely the way forward.  Most companies have been flying completely blind.”

Read the Full Story.


Also posted in Research | Leave a comment

SGI Infinite Storage Gateway Appliance Implements DMF in Minutes

Today SGI announced the InfiniteStorage Gateway, an appliance that delivers virtualized data management to lower the cost of high volume storage. Why an appliance? SGI’s Floyd Christofferson described it as a way  to “install DMF within minutes and enable greatly simplified storage management for Big Data.”

The SGI InfiniteStorage Gateway reduces the dependency on high-cost primary storage by creating a virtualized storage fabric that can include any mixture of disk, tape, Zero-Watt Disk or MAID, and object storage. While appearing to users and applications simply as online data, SGI InfiniteStorage Gateway offers IT administrators the ability to keep data protected and online at a fraction of the cost of primary storage systems.

As data growth has continued to sky-rocket, IT organizations increasingly face the problem of infrastructure fragmentation, and the fact that their most expensive primary storage arrays are often used to house mostly inactive data,” said Laura DuBois, program vice president, IDC Storage Systems, Software and Solutions. “Data management is not only about the performance of active data today. It also must provide a seamless long-term strategy for all data that keeps costs at a minimum and reduces IT administrative burden without impacting users.”

With up to 276TB of onboard capacity in a single 4U appliance, the gateway automatically places data on any or all storage devices and locations based upon what works best for the access requirements and data protection policies.

Read the Full Story or View the Slides on Slideshare.


Also posted in DMF, Hardware, Storage | Leave a comment

Creating Tomorrow’s Big Data Workforce

In an economy where jobs are still scarce, it’s good to hear that the demand for skilled workers in Big Data is on the rise.

Nationwide, hiring slowed significantly in March with employers adding only 88,000 jobs, down from an average of 220,000 from November through February, according to a story in U.S. News. But in certain sectors, there are jobs that are going unfulfilled.

Big Data is one of those sectors.  According to Cloudera, as Hadoop continues to make inroads into the enterprise, there is a rapidly growing need for skilled data workers.  So, the company has decided to help develop this talent pool starting at the source – in the educational system.

This week Cloudera launched the Cloudera Academic Partnership (CAP), a program to equip leading universities around the world with the curriculum and training to offer Big Data courses for engineering and analytics students.

There is no question that demand for qualified Big Data professionals is increasing rapidly, while a shortage of trained workers is creating a major skills gap in the marketplace. The next generation of developers, administrators, and analysts can become the first to include new platforms like Hadoop alongside traditional databases and business intelligence tools,” said Ben Woo, managing director at Neuralytix Inc. “However, colleges and universities have historically not had the necessary resources to include these advance data technologies in their curricula, with the burden falling on employers to find existing certified professionals among the short market supply or retrain their employees to keep pace with technology. It is intelligent of Cloudera to foster development of this future workforce early and at the source, and it is a great service to professors and students around the world. The Cloudera Academic Partnership is the first of its kind, and the company is aggressively leading the charge in Hadoop education and innovation.”

The CAP program provides a number of benefits, including allowing teachers and students affiliated with CAP program institutions to freely download Cloudera training materials; deep discounts on other training materials developed by Cloudera University; and access to a variety of support services such as classroom tools, instructor forums, and the world’s largest Hadoop knowledge base.

CAP’s seven charter members include:

  • Auburn University (Alabama)
  • California State University, Los Angeles (California)
  • Harvard University: Dana-Farber/Harvard Cancer Center (Massachusetts)
  • Purdue University (Indiana)
  • San Jose State University (California)
  • Technische Universität Berlin (Germany)
  • The University of Stavanger (Norway)

Educational institutions interested in the program can request an application. The CAP program also has more information online.

Read the Full Story.


Also posted in Jobs | Leave a comment

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap