In this special guest feature, Kevin Dudak from Spectra Logic looks at how unexpected volumes of data can quickly grow to pose all kinds of new challenges for the enterprise.
Lots of people and organizations are talking about big data, and the analytics side is getting a lot more attention than the storage site of the conversation. The potential of the analytics is fascinating, and I think we are still in the early days of its full potential. However, as the name implies, Big Data is big, and needs to be stored.
I got to thinking about this after an hour-long call with a customer that I only expected to take 5 minutes. He is quickly approaching 100 PB of data and does not have a plan for it. He has multiple disk and tape storage systems, with four different software solutions that manage portions of the data. I don’t think the company ever expected to grow to this size when it made its software and hardware decisions over the last 10 years. They are now facing several major challenges:
They have too many types of hardware and software for their staff to remain competent with all of them.
With a number of different systems, the amount of support contracts is difficult to manage, let alone deciphering the complexities of keeping everything running.
Power and cooling are crushing them. The monthly bill is affecting the finances of the company and they are struggling to be able to obtain more power to grow.
In the end, this should all be about the data, but with the data spread across so many systems and technologies, it is difficult to access and use, at best.
This company is not alone in their challenges. And it has happened to far too many organizations out there. Data islands and different storage systems all made sense when they were deployed and IT looked at them as single, standalone solutions. Several years and a lot of growth later, and many companies that didn’t consider themselves a ‘data company’ now find themselves with Big Data. The challenge now is to figure out how to get out of the unplanned mess they are in and get things straightened out.
This is a challenge that users, integrators and manufactures should be working on for the next few years. As we talk about how to solve these problems, I think the first step is to focus on the data. The data is the reason we have all this storage and computing resources. There are a number of things being done to solve these challenges. I’ll be sharing more about this in future posts.
Over at HPC Admin, Dell’s Jeff Layton writes that with today’s explosive data growth, at some point you will have to migrate data from one set of storage devices to another. To help move things along, he provides an overview of data migration tools.
At some point during this growth spurt, you will have to think about migrating your data from an old storage solution to a new one, but copying the data over isn’t as easy as it sounds. You would like to preserve the attributes of the data during the migration, including xattrs (extended attributes), and losing information such as file ownership or timestamps can cause havoc with projects. Plus, you have to pay attention to the same things for directories; they are just as important as the file themselves (remember that everything is a file in Linux). In this article, I wanted to present some possible tools for helping with data migration, and I covered just a few of them. However, I also wanted to take a few paragraphs to emphasize that you need to plan your data migration if you want to succeed.
With the rise of Big Data and increasing emphasis on data-intensive computing, you may be wondering how the tape vendors are doing. Well, very well, thank you, as evidenced by Spectra Logic’s announcement that that the company installed more than half an exabyte of tape storage capacity in the past six months alone. With revenue up 9 percent year-over-year, the company has a very bullish outlook indeed.
Our financial strength combined with strong growth in the file archive market is enabling us to invest significantly in R&D, which will be reflected in a strong slate of new products and technologies this coming year. In fact, over the next 12 months we plan to launch a large number of new data storage products – more than were released in the past five years combined,” said Nathan Thompson, Spectra Logic’s chief executive officer. “The value of tape technology and its essential role in today’s modern data centers is indisputable for both traditional backup as well as the tape-based file archive market. This value is recognized and embraced by leading organizations worldwide – and is reflected in Spectra Logic’s consistent growth and momentum.”
This week Spectra Logic announced that it has earned the top honors in the Storage magazine Quality Awards winning both the enterprise and midrange tape library categories.
Spectra Logic dominated this time around with the highest score in every [Enterprise] rating category,” said Rich Castagna, editorial director for the storage media group for Storage magazine and SearchStorage.com. “Spectra Logic’s performance in the midrange groups is equally impressive. The broader story is how satisfied users are with the tape storage systems, underscoring how vendors continue to produce high-quality and innovative products.”
This is the sixth time the company has won first place in a Quality Awards category, and caps a year of significant success and technological achievements, including the availability of 10GbE iSCSI connectivity, the commencement of LTO-6 shipments, and the installation of the 380 PBs nearline tape archive for NCSA’s Blue Waters supercomputing system, which is one of the largest and most powerful supercomputers in the world. Read the Full Story.
In this video from SC12, Molly Rector from Spectra Logic describes the latest advancements in the company’s tape-based storage technologies. Along the way, she debunks the myth that Amazon’s AWS Glacier offering will be the death of tape.
Today Spectra Logic announced the introduction of support for 10 Gigabit Ethernet iSCSI connectivity as an interface option for the Spectra T-Series tape libraries. Offered in partnership with Bridgeworks, the solution enables simple integration of tape systems into 10GbE SANs.
We’re excited to continue on the path of keeping tape storage systems easy to integrate in the modern data center. By supporting the Bridgeworks solution for 10GbE iSCSI connectivity to our T-series tape libraries, our customers who are designing data center solutions based on 10GbE iSCSI no longer need to maintain a FC SAN just for their tape storage system,” said Molly Rector, executive vice president of product management and worldwide marketing, Spectra Logic.
The 10GbE iSCSI to FC bridge is available for purchase through Bridgeworks. Read the Full Story.
Over at New Scientist, Paul Marks writes that new cassette tape technology from Fujitsu can store 35 terabytes of data – or about 35 million books’ worth of information – on a cartridge that measures just 10 centimetres by 10 cm by 2 cm.
But the real debut for this technology is likely to be the Square Kilometre Array (SKA), the world’s largest radio telescope, whose thousands of antennas will be strewn across the southern hemisphere (New Scientist, 2 June, p 4). Once it’s up and running in 2024, the SKA is expected to pump out 1 petabyte (1 million gigabytes) of compressed data per day.
Molly Rector from Spectra Logic writes that Amazon’s AWS Glacier cloud archival service is no “Tape Killer.”
Amazon Glacier is not practical for active archive markets with up to exabytes of data. Tape is. Bandwidth costs are too high, retrieval time is too slow and the Glacier model doesn’t meet the frequency of data access requirements. Just think about retrieval times: it would take about 40 days to move one (1) petabyte across an OC48 and 10 days to move it across 10 Gb/E. When talking about large data archive, local storage is a must.
As reported here, storage pundits have been dubious as to claims by Amazon that their new AWS Glacier cloud archiving service is a tape killer. At a penny per gigabyte per month, the Glacier press release had some journalists eating the dog food with speculations that Glacier could make tape silos obsolete.
This new infographic from Viawest depicts storage in the age of Big Data.
Big Data is a new frontier in IT where data sets are becoming so enormous that they are almost impossible to manage using traditional database management tools. Data types and content are getting more complicated, volume is going up and serious, valuable problems are there to be solved.