Discovering Gold with Big Data Analytics and Data-Intensive Computing

SAS Adds to its Big Data Analytics Product Lineup

 

SAS this week announced six new high-performance analytics products to further strengthen the company’s leadership role as a provider of advanced analytics for Big Data.

According to a recent IDC report, SAS analytics account for 35.2 percent worldwide market share, exhibiting steady growth over the past three years.  The company’s next four closest competitors combined hold only 21 percent of the market.

The new SAS products, available in June, are focused on a variety of analytic techniques, including data mining, text mining, optimization, forecasting, statistics and econometrics.


 
According to the press release, users can choose the relevant SAS offerings that will allow them to perform analytical tasks in-memory across Big Data to meet specific business challenges.

The new SAS High-Performance Analytics products help organizations overcome bottlenecks to answer the really tough questions, those problems that often involve large amounts of data,” said Henry Morris, IDC Senior Vice President of Worldwide Software and Services. “These new functionally specific analytic packages differ significantly from other in-memory products because they are broad and useful across any industry. Instead of being limited to a particular problem, or lacking real predictive capabilities, the new SAS analytics are useful for virtually any analytic purpose.”

The six new products include:

  • SAS High-Performance Statistics
  • SAS High-Performance Data Mining
  • SAS High-Performance Text Mining
  • SAS High-Performance Optimization
  • SAS High-Performance Econometrics
  • SAS High-Performance Forecasting.

The new software packages operate in massively parallel processing (MPP) environments, distributing complex analytic tasks across numerous server blades to perform computations in parallel. Because each blade has its own memory, execution is rapid; jobs that once took hours now take only minutes or even seconds.

These new offerings extend options for new configurations with Teradata, Greenplum (now part of Pivotal) and Hadoop, as well as adding Oracle support.

The announcement was made at SAS Global Forum 2013, underway this week in San Francisco.  Key sessions are being live streamed at the SAS video portal.

Read the Full Story.


Comment on this story

Advertisement

Got Big Data Jobs? Get Big Value with our inside-BigData Job Board. At just $79 for 90 days, our job ads get you huge exposure on the SimplyHired network.

HP Targets Big Data as Part of its Converged Infrastructure Push

 

HP is hopping on the Big Data bandwagon.

On Monday the company announced a new organization, HP Converged Systems, cobbled together from various dedicated resources “…to deliver purpose-built technology for social, cloud, mobile and big data solutions.

The business unit is charged with helping accelerate the next generation of the “Converged Infrastructure concept.”  Announced last February at its yearly Global Partner Conference, HP’s Converged Infrastructure portfolio includes the latest in its blade systems, as well as converged storage, a mixed bag of networking solutions, and HP services.

According to the press release, the HP Converged Systems business unit will extend the portfolio of converged application appliances that fuse infrastructure, applications into a single system.  Included are appliance systems for Hadoop, HP Vertica, SAP HANA, and HP CloudSystem.

HP continues to be at the forefront of the data center evolution, accelerating the pace of innovation for our customers,” said Dave Donatelli, executive vice president and general manager, HP Enterprise Group. “HP was the first to announce Converged Infrastructure, which each major technology company has since followed. Today’s organizational updates are the next logical step as we accelerate the delivery of game-changing converged systems technology.”

Although HP is quick to put its stamp on the Converged Infrastructure concept, the idea has been around for a while.  Wikipedia offers a succinct definition: “A converged infrastructure addresses the problem of siloed architectures and IT sprawl by pooling and sharing IT resources. Rather than dedicating a set of resources to a particular computing technology, application or line of business, converged infrastructures creates a pool of virtualized server, storage and networking capacity that is shared by multiple applications and lines of business.”

The Wikipedia entry goes on to say, “In April 2012, the open source analyst firm Wikibon released the first market forecast for Converged Infrastructure, with a projected $402B total available market (TAM) by 2017 of which, nearly 2/3rds of the infrastructure that supports enterprise applications will be packaged in some type of converged solution by 2017.”

Not the most elegant sentence in the world, but one that indicates that this is a big and growing market.

Wikibon also notes in another posting, “The total Big Data market reached $11.4 billion in 2012, ahead of Wikibon’s 2011 forecast. The Big Data market is projected to reach $18.1 billion in 2013, an annual growth of 61%. This puts it on pace to exceed $47 billion by 2017. That translates to a 31% compound annual growth rate over the five year period 2012-2017.”

It’s no wonder HP is setting its sights on the Converged Infrastructure and Big Data marketplaces – the combination is irresistible.

Read the Full Story.


Comment on this story

Algorithm Predicts Whether Startups Will Succeed

 

Over at Oregon Business, Linda Baker writes that Thomas Thurston from Growth Science has created a model that accurately predicts whether or not a Startup will succeed.

How does Thurston’s model work? It’s rooted in the mountains of data he has collected on market and corporate dynamics, including the anticipation of future changes in the marketplace. Patterns of success or failure then emerge depending on these different market and business behavior factors. “The key is identifying variables that are predictive of success and failure,” says Thurston, who is very hush-hush about revealing those variables. It’s a process that involves “lots of hard, hard work,” he says. “You go through a whole haystack to find one needle.”

Read the Full Story.


Comment on this story

Big Data Security – Not There Yet

 

When it comes to security, Big Data is a two-edged sword.

On one hand it can be used to analyze mountains of data in order to foil intruders, head off attacks and neutralize a wide variety of other threats. But the network architecture required to support Big Data analytics is itself vulnerable to attack.

Writing in CSO Magazine, John P. Mello, Jr. notes that Hadoop is frequently used in order to manage the computer clusters that are at the heart of Big Data deployments.  This, he says, can create problems for security people, especially if they are relying on traditional security tools.

He quotes a white paper from Zettaset, a Big Data security company, which asserts, “Incumbent data security vendors believe that Hadoop and distributed cluster security can be addressed with traditional perimeter security solutions such as firewalls and intrusion detection/prevention technologies. But no matter how advanced, traditional approaches that rely on perimeter security are unable to adequately secure Hadoop clusters and distributed file systems.”

Traditional security products are designed to protect a single database.  But when these products are called upon to protect a distributed cluster of computers that may number in the thousands, they fall short.

Mello interviewed Zettaset CTO Brian Christian.

When you put them (traditional security products) on a large scale distributed computing environment, they become either a choke point or a single point of failure for the entire cluster,” Christian said. “They could potentially be extremely dangerous running them on a cluster, because if they do fail, there is the potential to deny everybody on the cluster access to petabytes of data or a corruption of data in some of the encryption security technologies.”

Other problems arise when security is “bolted on” to an existing Big Data infrastructure, a costly and often ineffective procedure.

And, the story notes, when it comes to business versus security, business requirement takes precedence over implementing an ideal security solution.  Says Chris Petersen, CTO of LogRhythm, “While security catches up, there is going to vulnerability. My guess is that there is a lot of vulnerability right now in organizations adopting Hadoop.”

Read the Full Story.


Comment on this story

Video: Real-Time Big Data Analytics from Deployment to Production

 

In this video from the Strata 2013 Conference, David Smith from Revolution Analytics describes the five stages of real-time analytics deployment, and the technologies supporting each stage, including Hadoop, R, and database warehousing systems. He also shares some best practices for setting up a the technology stack and processes for model deployment, based on some real-life case studies.


Comment on this story

Here’s to Your Health with Big Data

 

Electronic health records (EHRs) were supposed to revolutionize healthcare, saving up to up to $81 billion a year through innovative new efficiencies and the collection of massive amounts of data that could be used to help prevent as well as cure diseases.

Well, it hasn’t happened.  Writing in Computerworld, Lucas Mearian reports that EHRs have become more of a hindrance than a help.

He quotes Dr. Robert Walker, director of health innovation for the U.S. Army Surgeon General, who said in an interview, “The electronic medical record has become an impediment versus something that was going to streamline your day. It took the focus away from the patient and put it all on the computer. People are clicking boxes and turning their backs to patients. It’s all about jamming data into this thing.”

But despite EHR’s shortcomings, the fact is that the program is gathering great quantities of invaluable clinical data and storing it in data warehouses. Researchers can access and analyze this data using powerful Big Data engines like Hadoop.

That’s the real renaissance that’s going to happen in health care,” Walker said. “With big data, what happens in a doctor’s office is going to be vastly different from what we see today. The top five or 10 things that people die from in America are life-style induced. That’s absurd. Maybe instead of vital signs, I’m just going to look at what you buy in a grocery store.”

Mearian cites several areas that are already reflecting the promise of improved health care with the help of Big Data analytics.  For example, advanced drug therapies are being developed through the study of genomics – a.k.a. personalized medicine.  Or there is the free open source software called i2b2 informatics that has been developed by Dr. Isaac S Kohane, a professor of pediatrics and health sciences technology at Harvard Medical School & Children’s Hospital. The software is being used by more than 100 academic health centers around the world to identify genetic predictors for diseases and harmful drugs.

Dr. Walker believes the real game changer in medicine will be an engaged patient, one who will enter his or her own data through the use of mobile devices,” Mearian reports. “And that data can include not just medical information, but also lifestyle updates involving diet and exercise. By having a full picture of a patient’s lifestyle, doctors are better equipped to help patients avoid the onset of chronic illnesses. Then, once the data is in an EHR, big data analytics engines could offer physicians information about patients who may need to adjust their caloric intake, level of activity or the amount of sleep they get.

Walker comments: “The answer to the obesity problem is not the operating table, but the dinner table, and that’s where we need to get to. In this country, we’re putting billions of dollars into healthcare and our life expectancies are less than in countries that spend a fraction of what we do. We’re really doing disease care and not healthcare today.”

Read the Full Story.


Comment on this story

Video: Lustre & ZFS go to Hollywood

 

In this video from the Lustre User Group 2013 conference, Josh Judd from Warp Mechanics presents: Lustre & ZFS go to Hollywood.

WARP Mechanics Ltd. is a leading provider of high performance computing (HPC) solutions. The company mission is to bring these super computing technologies into broader IT markets. Each WARP product is factory-optimized for vertical markets such as public-sector “Big Science”, commercial Bio/Life, Cloud, or Media/Entertainment, and can be rolled out in a turn-key fashion.

Download the Slides (PDF).


Comment on this story

High Performance RDMA-based Design for Big Data and Web 2.0 memcached

 

In this video from the 2013 Open Fabrics Developer Workshop, D.K. Panda from Ohio State University presents: High Performance RDMA-based Design for Big Data and Web 2.0 memcached.

You can check out more OFA videos at our Open Fabrics Workshop Video Gallery.


Comment on this story

Cutting Big Data Down to Size

 

Rufus Pollock

It’s always a pleasure to run across a well thought-out contrarian point of view, and Dr. Rufus Pollock provides just that in a recent blog post entitled “Forget Big Data, Small Data is the Real Revolution.”

Pollock is the founder and co-director of the Open Knowledge Foundation headquartered in Cambridge England. He casts a cold eye on all the feverish activity promoting Big Data, including Big Data week, which is currently underway (see the earlier Inside-Big Data story).

But the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined.”

He points out that the real revolution is the “mass democratization” of the means of accessing, storing and processing data.  This allows us to tap into a distributed ecosystem made up of small data.  Size is not what matters – the point is having the right data at hand that’s needed to deal with whatever issues we might be facing at the time.

For many problems and questions, small data in itself is enough. The data on my household energy use, the times of local buses, government spending – these are all small data,” Pollock writes. “Everything processed in Excel is small data. When Hans Rosling shows us how to understand our world through population change or literacy he’s doing it with small data. And when we want to scale up the way to do that is through componentized small data: by creating and integrating small data “packages” not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos. This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.”

Read the Full Story.


Comment on this story

Sage Weil Presents: An Intro to Ceph for HPC

 

In this video from the Lustre User Group 2013 conference, Sage Weil from Inktank presents: An Intro to Ceph for HPC.

Ceph is a free software unified storage platform designed to present object, block, and file storage from a single distributed cluster. Ceph’s main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is seamlessly replicated, making it fault tolerant. Ceph is a software-based solution and runs on commodity hardware. The system is designed to be both self-healing and self-managing and strives to reduce both administrator and budget overhead.

Check out more presentations at our LUG 2013 Video Gallery.


Comment on this story

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap