The International Supercomputing Conference (ISC’13) is the largest and most significant conference and networking in Europe for scientists, researchers and vendors within the HPC community. Visit www.isc13.org for details.
Over at SlashBI, Nick Kolakowski writes that the newly announced Splunk/Cloudera alliance hints at the new Big Data lanscape.
Under the terms of the alliance, Splunk Hadoop Connect will link Splunk Enterprise to Cloudera Enterprise, Cloudera’s Hadoop distribution (and associated projects). Which is certainly good for Cloudera: any number of companies have released Hadoop distributions over the past couple months, crowding the marketplace, and making it all the more vital for individual firms to sign “alliances” and other contracts for their respective offerings. Such alliances could also prove vital for smaller firms seeking to hold the line, as it were, against IT giants such as IBM and SAP. The latter, of course, have untold millions of dollars and large amounts of other resources to deploy in the search for data-analytics customers; faced with that sort of competition, startups and midsize companies need to consider how partnerships can amplify the reach of their analytics products.
In this slidecast, Ken Claffey from Xyratex describes the company’s new ClusterStor 1500 storage system. Designed for scale-out HPC storage solutions, the ClusterStor 1500 delivers HPC performance and efficiency with help from the Lustre file system.
Departments within larger organizations or medium-sized enterprises today, especially in the commercial, academic and government sectors, represent an underserved market. They need high-performance and scalable storage solutions that are cost-efficient, easy to deploy and manage and reliable even under heavy workloads,” said Ken Claffey, senior vice president of the ClusterStor business at Xyratex. “Growth in this market segment is being driven by the increasing adoption of simulation applications in a wide range of industries from car and aircraft design to chemical interactions and financial modeling. Traditional enterprise storage systems are simply not designed to meet the performance needs of these applications, so we engineered and built the affordable and modular ClusterStor 1500 to bring the performance power of Lustre to this underserved and growing market in the way that only ClusterStor can.”
With the ability to scale performance from 1.25GB/s to 110GB/s and raw capacity from 42TB to 7.3PB, ClusterStor 1500 is purpose-built to satisfy data intensive department level compute cluster needs, ClusterStor 1500 is designed to provide best in class scale-out storage for middle tier high performance computing environments. The ClusterStor 1500 solution features scale-out storage building blocks, the Lustre parallel filesystem and a comprehensive management platform that eliminates the guesswork usually associated with building and optimizing your own HPC storage solution.
In this slidecast, Justin Erickson from Cloudera presents a technical overview of Cloudera Impala, an SQL-on-Hadoop solution that enables users to do real-time queries of data stored in Hadoop clusters.
To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.
The Hydra60 is a combination Lustre OSS (object storage server) and OST (object storage target) with two active/active failover nodes and shared storage in a single system chassis with an ultra dense 60 drive 6Gb SAS storage infrastructure. With a unified and zonable 6Gb SAS dual-ported backplane and drives the Hydra60 can sustain a remarkable performance while providing high-availability to volumes or object storage. With external interface options including FDR Infiniband, 40/10GbE 1Gb Ethernet and supporting Linux and Lustre releases 2.x the Hydra60 makes an excellent storage platform for Lustre performance with HA operation. The design of Hydra60 provides an affordable, redundant and resilient storage platform by leveraging RAIDZ thereby eliminating the cost of hardware RAID controller technology.”
Over at HPC Admin, Dell’s Jeff Layton writes that with today’s explosive data growth, at some point you will have to migrate data from one set of storage devices to another. To help move things along, he provides an overview of data migration tools.
At some point during this growth spurt, you will have to think about migrating your data from an old storage solution to a new one, but copying the data over isn’t as easy as it sounds. You would like to preserve the attributes of the data during the migration, including xattrs (extended attributes), and losing information such as file ownership or timestamps can cause havoc with projects. Plus, you have to pay attention to the same things for directories; they are just as important as the file themselves (remember that everything is a file in Linux). In this article, I wanted to present some possible tools for helping with data migration, and I covered just a few of them. However, I also wanted to take a few paragraphs to emphasize that you need to plan your data migration if you want to succeed.
In this slidecast, Chris Matty from Versium describes the company’s Real-Life Data Intelligence Platform.
In consumer marketing, LifeData allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services. Greater real life insights about people and businesses enable optimized communication strategies. Sophisticated analytics can predict behavior and greatly improve decision-making. LifeData can make existing applications more intelligent and enable entirely new ones. Versium’s powerful data intelligence platform enables all of this and more.”