Interview: Active Archives for Managing and Storing Big Data

With massive amounts of data being produced daily, a practical solution for archiving this information is more critical than ever. The Active Archive Alliance recognizes that most storage solutions result in performance-based silos and that these silos can be overcome by virtualized solutions in a globally accessible environment. To learn more, we caught up with their Chairman of the Board, Floyd Christofferson.

insideBIGDATA: What companies are currently members of the Active Archive Alliance? 

Floyd Christofferson: Current members include Cleversafe, Crossroads Systems, Dell, Fujifilm, GRAU DATA AG, HP, Imation, QStar Technologies, Quantum, Scality, Seven10, SGI, Spectra Logic, XenData and DataDirect Networks. These companies have come together to advance the awareness and deployment advantages of active archives and to work together to offer the best solution for each customer.

insideBIGDATA: I understand a big goal for the Active Archive Alliance is to educate different industries on what active archives are and where they provide the most value. So let’s start there; what is an active archive? 

Floyd Christofferson: In an analog world, an archive is where old content goes to retire. In a digital world, all data is potentially valuable and needs to be rapidly available, protected and easy to manage. An active archive does just that – breaking down the barriers between long-term and primary data storage, and in the process reducing the IT overhead required to manage it.

The problem is that storage solutions typically result in performance-based silos. And as data grows beyond the limits of an individual platform, management difficulties grow, and efficiencies plummet. Additionally, data typically lives longer than the storage hardware it lives on, which then adds additional administrative burden to eventually migrate long-term data from old hardware to new.

An active archive virtualizes these silos into a single globally accessible environment, regardless of the storage type or performance needs. Bringing together best-of-breed software with everything from flash to tape, Alliance members provide numerous valid choices for an active archive that can be optimized for virtually any use case.

The result is to provide users online access to all data while also reducing the complexity and cost normally associated with very large datasets. Users see the data they need at all times, and IT administrators are able to optimize the storage choices to contain costs and complexity.

Long-term planning is a key factor when considering an active archive approach. Storage hardware is, by its nature, short-term, while data longevity requires a long-term solution. A true active archive environment should contemplate and provide for a seamless upgrade to future technologies across any or all performance tiers and media. Decoupling the data from any hardware dependency is key since it makes the active archive environment future proof.

insideBIGDATA: For what type of organizations might active archiving be useful?

Floyd Christofferson: Active archives are ideal for organizations that face exponential data growth or regularly manage high-volume unstructured data or digital assets. Target markets include life sciences, media and entertainment, education, research, government, financial services, oil and gas, and telecommunications, as well as general IT organizations requiring online data archive options.

insideBIGDATA: How is Big Data impacting the use and value of active archiving?

Floyd Christofferson: Big Data analytics reinforces the idea that all data needs to be online and seamlessly accessible. But the volume of big data means the costs of keeping everything in expensive primary disk is not sustainable. In a world where data has more potential value and a longer life, enterprises shouldn’t put their eggs in one basket. The silos of the past are becoming expensive impediments to effective management and protection of large volumes of data today.

Finding the most cost-effective, reliable and scalable way to store long-term data is essential for today’s Big Data environments. Active archive provides several benefits to help overcome these challenges:

  • Greater Capacity: Effortlessly scale to petabytes of storage.
  • Lower Cost: Reduce cost per TB as much as 75 percent.
  • High Accessibility: File-level access to less active data.
  • Easy Administration: Simple management of unstructured data.
  • Searchability: Leverage metadata and search tools to easily find data.
  • Long-term migration: A true active archive environment provides for a seamless upgrade to future technologies across any and all performance tiers.

insideBIGDATA: One message I’ve noticed you getting across is that active archiving enables easy access to all data (long-term and primary), while reducing complexity and cost. How does the Alliance, specifically, help companies accomplish this?

Floyd Christofferson: It’s a fact; no single vendor has the complete product set or resources to solve today’s data access demands. Active archiving provides enterprises with a combined solution of open systems applications, and disk and tape hardware that allows them to effortlessly store, manage and access all of their data. Developing policies to meet their organizations’ storage requirements enables automatic migration of data onto more cost-effective storage.

An NSF-funded study at the University of Santa Cruz showed that 90 percent of active storage is used to house data that is rarely, if ever, accessed, which means that expensive primary disk is used to house cool or cold data. But an active archive fabric solves this problem by automatically placing data in the most efficient class of storage. Users just see data in an active file system. The system itself is smart enough to move data to the appropriate performance tier to satisfy user needs.

Thus, expensive storage is kept to a minimum, reducing overall TCO by as much as 75 percent while also reducing the need for IT intervention.

insideBIGDATA: What can we look forward to from the Alliance and the world of active archiving? 

Floyd Christofferson: The Alliance will continue doing what we’ve been doing – educating the industry on the value of active archives for simplified, online access to all archived data. As for active archives in general, we believe 2014 is the year they will become a more mainstream best practice, driven by the following:

¨      Despite diminishing IT budgets, unstructured data will continue to grow at an exponentially rapid rate. IDC has forecasted 80 Exabytes of storage needed for 2014, where 70 of these Exabytes (almost 90 percent), is expected to come from unstructured data. To meet this challenge, active archive technologies will play an increasing role as organizations seek innovative storage solutions that meet today’s storage demands where older storage technologies come up short.

¨      We’ll see expanded software intelligence that makes tape easier to manage and streamlined within the storage environment. New appliances that front-end tape will make it easier for customers to use low-cost tape, delivering the long-term reliability, access and protection promised by an active archive.

¨      Not long ago, many in the industry declared that tape was dead. But in an active archive fabric, tape becomes the most scalable and low-cost storage. As a result, tape’s role in Big Data, cloud, HPC and other data intensive applications will continue to grow.

¨       The use of rudimentary data profiling will become increasingly important for deploying effective storage management strategies. Understanding baseline file attributes for large data sets will be important for matching performance requirements to storage options. This will allow storage administrators to architect solutions that meet their specific storage requirements and unlock the inherent benefits of an active archive strategy that results in real, hard dollar ROI.

 

Resource Links: