This week at ISC’13, we learned about the new Cray Cluster Connect. Described as a complete Lustre storage solution for x86 Linux clusters, Cray Cluster Connect is an “hardware agnostic” solution for customers who need flexibility in their storage configurations. In other words, you no longer need to buy a Cray supercomputer in order to get access to Cray’s Big Data and fast I/O expertise.
Cray has a long, rich history in the HPC storage space, and we have built some of the largest and fastest Lustre file systems in the world,” said Barry Bolding, Cray’s vice president of storage and data management. “With Cray Cluster Connect, we are applying our Lustre expertise and innovation, and taking all that we have learned, developed and invested in parallel storage solutions to an expanded customer base. We can now deliver end-to-end, Lustre storage solutions for customers’ existing x86 Linux environments. With the launch of Cray Cluster Connect, our storage and data management solutions are no longer limited to Cray supercomputer customers.”
Available now, Cray Cluster Connect offers a wide range of storage options including block storage components from Cray, DataDirect Networks, or NetApp plus a full set of management and storage connectivity tools for data movement, archiving and management. Read the Full Story.
More and more organizations are expanding their usage of Hadoop software beyond just basic storage and reporting. But while they’re developing increasingly complex algorithms and becoming more dependent on getting value out of Hadoop systems, they are also pushing the limits of their architectures,” said Bill Blake, senior vice president and CTO of Cray. “We are combining the supercomputing technologies of the Cray CS300 series with the performance and security of the Intel Distribution to provide customers with a turnkey, reliable Hadoop solution that is purpose-built for high-value Hadoop environments. Organizations can now focus on scaling their use of platform-independent Hadoop software, while gaining the benefits of important underlying architectural advantages from Cray and Intel.”
As you may recall, Cray acquired the CS300 cluster technology from Appro last year. This gives the company a more affordable cluster offering for markets that don’t require Cray’s low latency interconnect technology. Read the Full Story.
In this podcast, Jeff Denworth from DataDirect Networks introduces the company’s new President, Joe Cowan. As a former member of the DDN board, Cowan brings a wealth of experience in the enterprise, a space where DataDirect Networks hopes to grow with its Big Data solutions.
Blue Waters' 380 petabyte High Performance Storage System (HPSS)
Today NCSA announced that its 380 Petabyte High Performance Storage System is now in full service production as part of the Blue Waters project. Described as the world’s largest automated near-line data repository for open science, the HPSS environment comprises multiple automated tape libraries, dozens of high-performance data movers, a large 40 Gigabit Ethernet network, hundreds of high-performance tape drives, and about a 100,000 tape cartridges.
This “big data” capacity is available to scientists and engineers using the sustained petascale Blue Waters supercomputer. The storage system can be easily expanded and extended to accommodate the extreme data needs of other science, engineering, or industry projects.
With the world’s largest HPSS now in production, Blue Waters truly is the most data-focused, data-intensive system available to the U.S. science and engineering community,” said Blue Waters deputy project director Bill Kramer.
The HPSS hierarchical file system software is designed to efficiently manage the access and storage of hundreds petabytes of data at high data rates. HPSS manages the life cycle of data by moving inactive data to tape and retrieving it the next time it is referenced. The highly scalable HPSS is the result of two decades of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide.
NCSA joined forces with the HPSS Collaboration’s Department of Energy labs and IBM to develop an HPSS capability for Redundant Arrays of Independent Tapes (RAIT)—tape technology similar to RAID for disk. RAIT dramatically reduces the total cost of ownership and energy use to store data without danger from single or dual points of failure through generated parity blocks. It also enhances the performance of data storage and retrieval since the data is stored and read/written in parallel.
In this slidecast, Marty Czekalski from the SCSI Trade Association presents: Extending the SCSI Platform of Innovation.
SCSI Express is the robust and proven SCSI protocol combined with PCIe that creates an industry-standard path to PCIe-based storage. SCSI Express combined with SAS-based solutions provides unprecedented performance and low latency that enterprises demand.
Henry Newman’s head is still spinning from the technology previews he saw at the recent IEEE Mass Data Storage Conference. Could this be a game-changer for the tape industry and archiving data? What are the specs, limitations, and most importantly, are these the droids we’ve been looking for?
There is a fierce competition on the storage market to offer the best performing devices, with great management at a low price. The EIOW group, from the outset, decided that it would not attempt to offer an end-to-end solution, which would necessarily involve competing instead of working with storage providers. The focus of EIOW is on middleware to provide, for example, schemas describing data structure and layout, novel access methods to data for applications, a uniform data management infrastructure and a framework for the implementation of layered I/O software, similar in spirit to HDF5 as a specialized use of a parallel file system. We decided EIOW should be open, and have interfaces to layer on lower level storage infrastructure such as object stores, databases and file systems as provided by storage providers, to allow their expertise and leadership in this area to continue to benefit the HPC community.
Over at HPC Admin, Dell’s Jeff Layton writes that with today’s explosive data growth, at some point you will have to migrate data from one set of storage devices to another. To help move things along, he provides an overview of data migration tools.
At some point during this growth spurt, you will have to think about migrating your data from an old storage solution to a new one, but copying the data over isn’t as easy as it sounds. You would like to preserve the attributes of the data during the migration, including xattrs (extended attributes), and losing information such as file ownership or timestamps can cause havoc with projects. Plus, you have to pay attention to the same things for directories; they are just as important as the file themselves (remember that everything is a file in Linux). In this article, I wanted to present some possible tools for helping with data migration, and I covered just a few of them. However, I also wanted to take a few paragraphs to emphasize that you need to plan your data migration if you want to succeed.