There is a fierce competition on the storage market to offer the best performing devices, with great management at a low price. The EIOW group, from the outset, decided that it would not attempt to offer an end-to-end solution, which would necessarily involve competing instead of working with storage providers. The focus of EIOW is on middleware to provide, for example, schemas describing data structure and layout, novel access methods to data for applications, a uniform data management infrastructure and a framework for the implementation of layered I/O software, similar in spirit to HDF5 as a specialized use of a parallel file system. We decided EIOW should be open, and have interfaces to layer on lower level storage infrastructure such as object stores, databases and file systems as provided by storage providers, to allow their expertise and leadership in this area to continue to benefit the HPC community.
When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.
Read the Full Story.
Under the terms of the alliance, Splunk Hadoop Connect will link Splunk Enterprise to Cloudera Enterprise, Cloudera’s Hadoop distribution (and associated projects). Which is certainly good for Cloudera: any number of companies have released Hadoop distributions over the past couple months, crowding the marketplace, and making it all the more vital for individual firms to sign “alliances” and other contracts for their respective offerings. Such alliances could also prove vital for smaller firms seeking to hold the line, as it were, against IT giants such as IBM and SAP. The latter, of course, have untold millions of dollars and large amounts of other resources to deploy in the search for data-analytics customers; faced with that sort of competition, startups and midsize companies need to consider how partnerships can amplify the reach of their analytics products.
Read the Full Story.
In this slidecast, Ken Claffey from Xyratex describes the company’s new ClusterStor 1500 storage system. Designed for scale-out HPC storage solutions, the ClusterStor 1500 delivers HPC performance and efficiency with help from the Lustre file system.
Departments within larger organizations or medium-sized enterprises today, especially in the commercial, academic and government sectors, represent an underserved market. They need high-performance and scalable storage solutions that are cost-efficient, easy to deploy and manage and reliable even under heavy workloads,” said Ken Claffey, senior vice president of the ClusterStor business at Xyratex. “Growth in this market segment is being driven by the increasing adoption of simulation applications in a wide range of industries from car and aircraft design to chemical interactions and financial modeling. Traditional enterprise storage systems are simply not designed to meet the performance needs of these applications, so we engineered and built the affordable and modular ClusterStor 1500 to bring the performance power of Lustre to this underserved and growing market in the way that only ClusterStor can.”
With the ability to scale performance from 1.25GB/s to 110GB/s and raw capacity from 42TB to 7.3PB, ClusterStor 1500 is purpose-built to satisfy data intensive department level compute cluster needs, ClusterStor 1500 is designed to provide best in class scale-out storage for middle tier high performance computing environments. The ClusterStor 1500 solution features scale-out storage building blocks, the Lustre parallel filesystem and a comprehensive management platform that eliminates the guesswork usually associated with building and optimizing your own HPC storage solution.
In this slidecast, Justin Erickson from Cloudera presents a technical overview of Cloudera Impala, an SQL-on-Hadoop solution that enables users to do real-time queries of data stored in Hadoop clusters.
To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.
The Hydra60 is a combination Lustre OSS (object storage server) and OST (object storage target) with two active/active failover nodes and shared storage in a single system chassis with an ultra dense 60 drive 6Gb SAS storage infrastructure. With a unified and zonable 6Gb SAS dual-ported backplane and drives the Hydra60 can sustain a remarkable performance while providing high-availability to volumes or object storage. With external interface options including FDR Infiniband, 40/10GbE 1Gb Ethernet and supporting Linux and Lustre releases 2.x the Hydra60 makes an excellent storage platform for Lustre performance with HA operation. The design of Hydra60 provides an affordable, redundant and resilient storage platform by leveraging RAIDZ thereby eliminating the cost of hardware RAID controller technology.”
For more on Lustre, check out our LUG 2013 Video Gallery.
Over at HPC Admin, Dell’s Jeff Layton writes that with today’s explosive data growth, at some point you will have to migrate data from one set of storage devices to another. To help move things along, he provides an overview of data migration tools.
At some point during this growth spurt, you will have to think about migrating your data from an old storage solution to a new one, but copying the data over isn’t as easy as it sounds. You would like to preserve the attributes of the data during the migration, including xattrs (extended attributes), and losing information such as file ownership or timestamps can cause havoc with projects. Plus, you have to pay attention to the same things for directories; they are just as important as the file themselves (remember that everything is a file in Linux). In this article, I wanted to present some possible tools for helping with data migration, and I covered just a few of them. However, I also wanted to take a few paragraphs to emphasize that you need to plan your data migration if you want to succeed.
Read the Full Story.
In this slidecast, Chris Matty from Versium describes the company’s Real-Life Data Intelligence Platform.
In consumer marketing, LifeData allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services. Greater real life insights about people and businesses enable optimized communication strategies. Sophisticated analytics can predict behavior and greatly improve decision-making. LifeData can make existing applications more intelligent and enable entirely new ones. Versium’s powerful data intelligence platform enables all of this and more.”
In this slidecast, Scott Gnau from Teradata Labs presents: Teradata Intelligent Memory.
The introduction of Teradata Intelligent Memory allows our customers to exploit the performance of memory within Teradata Platforms, which extends our leadership position as the best performing data warehouse technology at the most competitive price,” said Scott Gnau, president, Teradata Labs. “Teradata Intelligent Memory technology is built into the data warehouse and customers don’t have to buy a separate appliance. Additionally, Teradata enables its customers to buy and configure the exact amount of in-memory capability needed for critical workloads. It is unnecessary and impractical to keep all data in memory, because all data do not have the same value to justify being placed in expensive memory.”
How does Intelligent Memory work? This animation video does a good job of making this advanced technology look simple.
In this podcast, the Radio Free HPC team discusses Lustre and LUG 2013 with Brent Gorda. Now part of Intel in their High Performance Data Division, Gorda was CEO of Whamcloud when the company was acquired last summer.
Gorda recently wrote a post about the rapid growth of the Lustre community, so we started our discussion there and learned a good deal more about the popular file system.