In this podcast, the Radio Free HPC team discusses Lustre and LUG 2013 with Brent Gorda. Now part of Intel in their High Performance Data Division, Gorda was CEO of Whamcloud when the company was acquired last summer.
The International Supercomputing Conference (ISC’13) is the largest and most significant conference and networking in Europe for scientists, researchers and vendors within the HPC community. Visit www.isc13.org for details.
DDN has developed a Hadoop solution that is all about time to value: It simplifies rollout so that enterprises can get up and running more quickly, provides typical DDN performance to accelerate data processing, and reduces the amount of time needed to maintain a Hadoop solution.” said Dave Vellante, Chief Research Officer, Wikibon.org. “For enterprises with a deluge of data but a limited IT budget, the DDN hScaler appliance should be on the short list of potential solutions.”
In this video from LUG 2013, Jeff Johnson from Aeon Computing presents an overview of the company’s innovative Lustre storage solutions.
Our focus is on the design and deployment of highly-integrated HPC, clustered computing, and storage solutions for all areas of computer-aided research and production. With over 55 years of staff experience in high-performance computing, enterprise computing architectures, and data storage, our focus is on architecting a perfectly-suited solution for your needs. We do not adhere to the large manufacturer approach of “one size fits most”. Every application or research methodology is different. We prefer to learn about our customer’s research, their needs and challenges. Our strength comes from being able to draw from many different types of technologies and configuration approaches. Our goal is to design and deploy a cluster or system for our customers that avoids needless bottlenecks, limitations or design and configuration flaws that are commonly found in solutions designed by companies that are more focused on profits or sales incentives from manufacturers.”
Indiana University has contributed Big Data expertise and infrastructure to NASA’s Operation IceBridge, a decade-long polar ice monitoring project.
For the past four years, IU Research Technologies, a cyberinfrastructure and service center affiliated with the Pervasive Technology Institute (PTI), has provided IT support for the Center for Remote Sensing of Ice Sheets (CReSIS), a National Science Foundation Science and Technology Center led by the University of Kansas. Kansas scientists provide NASA with the radar technology that measures the physical interactions of polar ice sheets in Greenland, Chile and Antarctica. IU experts bring innovative data management and storage solutions to the missions.
Essentially, IU has built a supercomputer that can fly,” said Rich Knepper, manager of IU’s campus bridging and research infrastructure team within Research Technologies. “During this current mission, our system provided analysis of radar data as the data was collected – in real time — allowing mission scientists to see the ice bed information as the plane flies over the Arctic.”
SAS this week announced six new high-performance analytics products to further strengthen the company’s leadership role as a provider of advanced analytics for Big Data.
According to a recent IDC report, SAS analytics account for 35.2 percent worldwide market share, exhibiting steady growth over the past three years. The company’s next four closest competitors combined hold only 21 percent of the market.
The new SAS products, available in June, are focused on a variety of analytic techniques, including data mining, text mining, optimization, forecasting, statistics and econometrics.
According to the press release, users can choose the relevant SAS offerings that will allow them to perform analytical tasks in-memory across Big Data to meet specific business challenges.
The new SAS High-Performance Analytics products help organizations overcome bottlenecks to answer the really tough questions, those problems that often involve large amounts of data,” said Henry Morris, IDC Senior Vice President of Worldwide Software and Services. “These new functionally specific analytic packages differ significantly from other in-memory products because they are broad and useful across any industry. Instead of being limited to a particular problem, or lacking real predictive capabilities, the new SAS analytics are useful for virtually any analytic purpose.”
The six new products include:
SAS High-Performance Statistics
SAS High-Performance Data Mining
SAS High-Performance Text Mining
SAS High-Performance Optimization
SAS High-Performance Econometrics
SAS High-Performance Forecasting.
The new software packages operate in massively parallel processing (MPP) environments, distributing complex analytic tasks across numerous server blades to perform computations in parallel. Because each blade has its own memory, execution is rapid; jobs that once took hours now take only minutes or even seconds.
These new offerings extend options for new configurations with Teradata, Greenplum (now part of Pivotal) and Hadoop, as well as adding Oracle support.
On Monday the company announced a new organization, HP Converged Systems, cobbled together from various dedicated resources “…to deliver purpose-built technology for social, cloud, mobile and big data solutions.
The business unit is charged with helping accelerate the next generation of the “Converged Infrastructure concept.” Announced last February at its yearly Global Partner Conference, HP’s Converged Infrastructure portfolio includes the latest in its blade systems, as well as converged storage, a mixed bag of networking solutions, and HP services.
According to the press release, the HP Converged Systems business unit will extend the portfolio of converged application appliances that fuse infrastructure, applications into a single system. Included are appliance systems for Hadoop, HP Vertica, SAP HANA, and HP CloudSystem.
HP continues to be at the forefront of the data center evolution, accelerating the pace of innovation for our customers,” said Dave Donatelli, executive vice president and general manager, HP Enterprise Group. “HP was the first to announce Converged Infrastructure, which each major technology company has since followed. Today’s organizational updates are the next logical step as we accelerate the delivery of game-changing converged systems technology.”
Although HP is quick to put its stamp on the Converged Infrastructure concept, the idea has been around for a while. Wikipedia offers a succinct definition: “A converged infrastructure addresses the problem of siloed architectures and IT sprawl by pooling and sharing IT resources. Rather than dedicating a set of resources to a particular computing technology, application or line of business, converged infrastructures creates a pool of virtualized server, storage and networking capacity that is shared by multiple applications and lines of business.”
The Wikipedia entry goes on to say, “In April 2012, the open source analyst firm Wikibon released the first market forecast for Converged Infrastructure,with a projected $402B total available market (TAM) by 2017 of which, nearly 2/3rds of the infrastructure that supports enterprise applications will be packaged in some type of converged solution by 2017.”
Not the most elegant sentence in the world, but one that indicates that this is a big and growing market.
Wikibon also notes in another posting, “The total Big Data market reached $11.4 billion in 2012, ahead of Wikibon’s 2011 forecast. The Big Data market is projected to reach $18.1 billion in 2013, an annual growth of 61%. This puts it on pace to exceed $47 billion by 2017. That translates to a 31% compound annual growth rate over the five year period 2012-2017.”
It’s no wonder HP is setting its sights on the Converged Infrastructure and Big Data marketplaces – the combination is irresistible.
Over at Oregon Business, Linda Baker writes that Thomas Thurston from Growth Science has created a model that accurately predicts whether or not a Startup will succeed.
How does Thurston’s model work? It’s rooted in the mountains of data he has collected on market and corporate dynamics, including the anticipation of future changes in the marketplace. Patterns of success or failure then emerge depending on these different market and business behavior factors. “The key is identifying variables that are predictive of success and failure,” says Thurston, who is very hush-hush about revealing those variables. It’s a process that involves “lots of hard, hard work,” he says. “You go through a whole haystack to find one needle.”
When it comes to security, Big Data is a two-edged sword.
On one hand it can be used to analyze mountains of data in order to foil intruders, head off attacks and neutralize a wide variety of other threats. But the network architecture required to support Big Data analytics is itself vulnerable to attack.
Writing in CSO Magazine, John P. Mello, Jr. notes that Hadoop is frequently used in order to manage the computer clusters that are at the heart of Big Data deployments. This, he says, can create problems for security people, especially if they are relying on traditional security tools.
He quotes a white paper from Zettaset, a Big Data security company, which asserts, “Incumbent data security vendors believe that Hadoop and distributed cluster security can be addressed with traditional perimeter security solutions such as firewalls and intrusion detection/prevention technologies. But no matter how advanced, traditional approaches that rely on perimeter security are unable to adequately secure Hadoop clusters and distributed file systems.”
Traditional security products are designed to protect a single database. But when these products are called upon to protect a distributed cluster of computers that may number in the thousands, they fall short.
Mello interviewed Zettaset CTO Brian Christian.
When you put them (traditional security products) on a large scale distributed computing environment, they become either a choke point or a single point of failure for the entire cluster,” Christian said. “They could potentially be extremely dangerous running them on a cluster, because if they do fail, there is the potential to deny everybody on the cluster access to petabytes of data or a corruption of data in some of the encryption security technologies.”
Other problems arise when security is “bolted on” to an existing Big Data infrastructure, a costly and often ineffective procedure.
And, the story notes, when it comes to business versus security, business requirement takes precedence over implementing an ideal security solution. Says Chris Petersen, CTO of LogRhythm, “While security catches up, there is going to vulnerability. My guess is that there is a lot of vulnerability right now in organizations adopting Hadoop.”
In this video from the Strata 2013 Conference, David Smith from Revolution Analytics describes the five stages of real-time analytics deployment, and the technologies supporting each stage, including Hadoop, R, and database warehousing systems. He also shares some best practices for setting up a the technology stack and processes for model deployment, based on some real-life case studies.