Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Tape”

The Secret Life of Tape

Our favorite storage pundit Henry Newman writes that while tape is the best technology for long term data storage, you still need to be mindful of its life span:

Let me repeat: Tape does not have a standard framework to known information that is collected and analyzed. There are vendors that provide third-party products, and some tape library vendors support collection, but it is not a standard. This, in my opinion, is a big mistake for the tape drive vendors, as you cannot track the media or drive issues without specialize software.

Read the Full Story.


Also posted in Archival, Hardware, I/O | Leave a comment

A New HPC Problem: Checksums for Large Archives

Storage pundit Henry Newman writes that running checksums for large data archives is quickly becoming an HPC problem:

Today, many preservation archives are well over 5PB and a few are well over 10PB with expectations that these archives will grow to more than 100PB. With archives this large, the requirements for HPC architectures for checksum validation are not much different than many of the standard HPC simulation problems, such as weather, crash, and other simulations.

I’ve always thought of large-scale archiving as an IO problem, but I was talking to Henry about this a few weeks ago and he described the monumental problem of validating archive data on a regular basis:

To validate the checksum for a file, the whole file must be read from disk or tape into memory and have the checksum algorithm applied to the data read and then compare the checksum that was just calculated to the stored checksum, which should be checksummed also so you are sure that you have a valid checksum to compare to the file you read into memory. With large archive systems, this is often an ongoing process whether the data resides on disk or tape, but checksum validation is particularly critical for disk-based archives with consumer-grade storage.

We tend to think of HPC devices as general-purpose number crunchers. It could be that the vendor who invents the better mousetrap for checkbit sums will be the next company to enjoy the big margins enjoyed by the supercomputing industry in the 80′s. Full Story


Also posted in Archival, HPC | Leave a comment

Oracle Ups Storage Ante with 5 Terabyte Tape Drive

By Chris MellorGet more from this author

Oracle is increasing the capacity of its T10000 tape fivefold, doubling its I/O speed, and enabling a 5-exabyte tape library.

The T10000 C (T10K C) tape has a 5TB native capacity – the highest in the industry – and the associated drive has a 240MB/sec throughput. In comparison, the current T10K B format, introduced in August 2008, has a 1TB capacity and 120MB/sec throughput. The StreamLine 8500 library can now store up to an exabyte of data using 100,000 T10K C cartridges with a 2:1 compression ratio.

In term of capacity, it blows competing libraries from IBM, Quantum, and SpectraLogic out of the water.

The T10K C drive and StreamLine library can be connected to both mainframe and open systems servers via FICON and Fibre Channel links.

Oracle has also introduced File Sync Accelerator technology to speed up the restoration of small files from the T10K C. There’s more on this in a 5-page datasheet (PDF).

There are effectively only two tape formats that compete with the T10K C: LTO-5 and IBM’s TS1130. LTO-5 has a 1.4TB raw capacity and transfers data at up to 140MB/sec. The forthcoming LTO-6, expected in mid-to-late 2012, will offer 3.2TB raw data capacity and a 210MB/sec transfer rate. In years to come, there will be LTO-7 (6.4TB raw, 315MB/sec) and LTO-8 (12.8TB raw, 427MB/sec) formats.

IBM’s TS1130 tape has a 1TB capacity with JB/JX media, and a 160MB/sec data rate – although it has a 400MB/sec burst rate. Oracle has leapfrogged both competing formats comprehensively. LTO won’t catch up until the LTO-7 format, expected roughly in the 2014/2015 timeframe. IBM’s TS1130 roadmap is not publicly known, but we might expect Big Blue to announce something in response to Oracle.

Oracle worked with FujiFilm, which developed the Barium Ferrite particles used in the tape media coating. FujiFilm calls this its Nanocubic technology, and IBM Research said in January of last year that it was working on a 35TB tape cartridge using this technology. That indicates to us that Big Blue will have an Oracle-equalling or Oracle-bashing tape announcement in the not too distant future.

For now, Oracle says that compared to either TS1130 and LTO-5-based libraries, its library can store 17 times more data (than IBM), run over five times faster, save 3x to 5x on floorspace, and 23 per cent on cost of ownership. The T10K C backs up a 35TB database in four hours; this would take six hours with a TS1130 and seven hours with a LTO-5 library. Take that, not-quite-so Big Blue and you LTO consortium members HP, IBM, and Quantum.

The new drives can be fitted into a StreamLine SL3000 library, and a T10K C drive can read T10K A and B format tapes. A CERN researcher was quoted by Oracle on its webcast announcing the new format, and we can assume that Oracle’s tape customer base will take to the new format eagerly.

The StreamLine can be used as part of a hierarchical storage scheme that combines disk and tape with Sun Storage Archive Manager, and can sit behind a StorageTek Virtual Tape Library to provide a disk-to-disk-to-tape data protection scheme combining disk speed and tape capacity. There is strong integration with Oracle software products.

All-in-all, there’s simply no denying that Larry’s engineers have come up with a winner. Price and availability details were not released. ®

This article originally appeared in The Register.


Also posted in Hardware, Storage | Leave a comment

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap