Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Archival”

Slidecast: TwinStrata CloudArray – Disaster Recovery as a Service

In this slidecast, Nicos Vekiarides from TwinStrata presents: TwinStrata CloudArray 4.5 with DRaaS. The new offering is an on-demand disaster recovery as a service (DRaaS) for VMware users.

Whether your goals are to increase storage capacity, improve off-site data protection, implement disaster recovery or all three of the above, TwinStrata CloudArray is the most comprehensive storage solution available today,” said Nicos Vekiarides, CEO of TwinStrata. “TwinStrata has made great strides in delivering enterprise-class functionality at a fraction of the cost typically required of storage solutions. What’s exciting is CloudArray 4.5 enables organizations to enjoy a full business continuity plan without the need for backup software or a dedicated disaster site– a once unthinkable proposition.”

Read the Full Story * View the slides * * Download the MP3Subscribe on iTunesSubscribe to RSS


Also posted in Cloud, Podcasts, Storage, Video | Leave a comment

Video: Today’s I/O Challenges for Big Data Analysis

In this video from the 2013 HPC User Forum, Henry Newman from Instrumental presents: Today’s I/O Challenges for Big Data Analysis.

Those who own the archive own the big data solutions as you cannot move data around.

Download the slides (PDF) or check out the HPC User Forum Video Gallery.


Also posted in HPC, I/O, Video | Leave a comment

Hengeveld: Big Data Meets HPC to Solve Hard Problems and Improve Lives

By John Hengeveld

John Hengeveld is the HPC Segment Marketing Director for Intel’s Technical Computing Group.  His Intel Developer Forum session titled “Big Data Meets High Performance Computing” will take place at 3:30 p.m. Wednesday in Room 2002 of Moscone West, San Francisco.

I’ve been hearing a lot buzz about “Big Data” … people talking in terms of mining Facebook posts for marketing data. I didn’t take all the talk seriously at first, but I do now. … Let me tell you how Big Data might just save my life.

In March, I had a major appendix attack. And it turns out that within my appendix was a material called appendiceal mucinous neoplasm, which is a very rare type of cancer.  There is no cure for my cancer—not yet, anyway. I’m just hanging on and crossing my fingers and hoping things work out.

Now, the first time my doctor went over the pathology report, she told me I had a 30-60 percent chance of having less than seven years to live. But then I got some good news from my doctors. After a lot of study and analysis, they offered a more encouraging assessment. They reasoned that I had a better-than-average prognosis after all, given that I didn’t appear to have very much of the material or to have had a lengthy exposure to it. So I went back to work.

But it turns out there is a high likelihood that in the relatively near future Big Data and high-performance computing (HPC) might work together to unravel the mysteries of rare cancers like mine—and offer new hope to people like me.

I like to think of Big Data as an oil field with a lot of breadth and a lot of depth. To get value out of the field, you need a powerful pump, and that’s HPC. The HPC pump allows you to draw insights from the Big Data. Today, researchers are doing just this across a broad spectrum of fields. For me, the research being done in the field of genomics hits closest to home, because this research could eventually lead to a world of personalized therapies based on a genomic analysis of a patient’s cancer.

This is one of the topics we will dive into during a session I will lead Wednesday at the Intel Developer Forum. That session—titled “Big Data Meets High Performance Computing”—will include an appearance by Professor Michael Franklin, a computer scientist who directs the AMPLab at UC Berkeley, one of the leading teams working on applications of Big Data to a new generation of problems.

Professor Franklin will explore some of the latest innovations in five applications that combine Big Data with HPC. These applications range from genomics research to crowd-sourcing to increase battery life on your cell phone (yes, it works—I’ve done it). I, of course, will have a special interest in the discussion of the role that Big Data and HPC can play in helping researchers understand the genetics in cancers and formulate appropriate therapies.

Already, people at Berkeley are using HPC to study the public data on cancer genomes. They have accessed what’s called The Cancer Genome Atlas. This atlas shows the genomics of tumors and their hosts. The study is focused on finding the mutations that have derived the cancers from the hosts, and then using that knowledge to understand the nature of the mutations that are occurring and how they might be blocked or eliminated.

This kind of research is good news—not just for me but for many other cancer patients to come. In this sense, Big Data and HPC provide hope for the future.

From my perspective, Big Data is not about shifting through massive numbers of Facebook posts and seeing who the “likes” are. It’s really about generating insights to solve hard problems and improve the lives of people.


Also posted in HPC | Leave a comment

Thinking about Big Data on the Eve of the Spring Trade Show Season

In this special guest feature, Spectra Logic’s Kevin Dudak writes that the world of Big Data is much more than just business analytics.

The month of March brings longer days, warmer weather and the start of the spring trade show season.  There seem to be as many trade shows as there are interest and industries.  Last year, we saw a lot of people start talking about Big Data at these shows.  The trend most likely will continue, with Big Data taking a bigger share of the conversation.

Given the years I have been in the storage industry, it should come as no surprise that I tend to look at the storage part of Big Data. Over the last year we have heard a lot about the analytics side of Big Data.  It is exciting seeing all the amazing things we can do, and things we can learn from the massive amount of data we have at our finger tips these days. Without a doubt, we will continue to see much of the conversation focus on leveraging our data sets with tools like Hadoop. Sometimes, it seems we forget that Big Data is more than just the analytics; it is also about storing and managing potentially massive data sets.  2012 will see users and vendors starting to address the changes Big Data brings to storage.

The 2012 Tape Summit and the HPC Symposium kick off the season. The second annual Tape Summit is the gathering of top manufactures in the Data Tape, including drive, library, software and media companies; as well as press, analysts and bloggers. You don’t see tape and Big Data in the same conversation too often, but I think the tape industry will be looking to change that this year.  We will be hearing about Linear Tape File System (LTFS,) continued innovation in data management software and possibly the coming LTO6 and how all of these can have a big impact on storing lots of data.

The HPC Symposium will see presentations from some of the top organizations in the distributed high performance world. Many of the lessons the HPC world has learned over the last 5 years will make the adoption of Big Data easier and more effective.

I’ll be watching to see how LTFS might be a good answer to Big Data portability. We are seeing LTFS gain traction in some verticals like Media and Entertainment already. The question of how to move Petabytes of data, either to seed a cloud provider or just move to a different location has always been a problem. LTFS might just provide a good answer.

Dealing with massive data sets, be it integrity checking the data or protecting it is a struggle we all face at one time or another. We are starting to see a new crop of software vendors, some in the Active Archive Alliance, that are creating data storage environments.

Finally, with the expected shipment of LTO6 this calendar year, we will see a doubling of native capacity on media.  There should be performance improvements as well. Since the LTO consortium is attending Tape Summit, hopefully we will get more details on it, and how it might affect the economy of storing big data.

As March rolls on, we should start to see a lot of information coming out of events such as the HPC Symposium and the Tape Summit on not only how to analyze Big Data, but how to manage and store it when it isn’t being crunch.

About Kevin Dudak

Kevin Dudak, a United States Air Force veteran, originally joined Spectra Logic in 2007 as a Product Manager, and brings more than 15 years of storage industry experience to his role. As product manager of BlueScale Software for T-Series Tape Libraries and nTier Disk Systems, Dudak is key in helping to define Spectra Logic’s role in big data and archive storage. Dudak possesses a diverse technology background drawn from years in the hardware and software storage market, where he has architected and overseen storage implementation and support. He earned both a bachelor of arts in economics and university studies from the University of New Mexico. During his spare time he enjoys aviation photography, working on classic cars, camping in Colorado and bike riding, and participates in the Registers Annual Great Bicycle Ride Across Iowa (RAGBRAI) each year.


Also posted in Business of Big Data, Events | 1 Comment

Tape is all about Business Continuity

Jon Hiles from Spectra Logic writes that protecting Big Data with tape is all about preserving business continuance.

Business continuance can be preserved through the use of tape storage like no other simply because tape can be “unplugged” from the system. As now defunct Australian web hosting firm distribute.IT learned, failure to adequately protect its data with off-line storage i.e., tape, resulted in 30 minutes worth of hacker mayhem putting the company out of business. 4,800 of its customers lost their data with no recourse while the negative business implications of the attack cascaded through distribute.IT’s customer base and affiliates. Failure to have off-line tape backups allowed the attack to destroy the firm’s disk-based backup data rendering the company inert.

Read the Full Story.


1 Comment

Video: The Coming Explosion of Data from the Internet of Things

In this video, Google’s Marissa Mayer discusses the Internet of Things.

One of the key aspects of the emerging Internet of Things – where real-world objects are connected to the Internet – is the massive amount of new data on the Web that will result. As more and more “things” in the world are connected to the Internet, it follows that more data will be uploaded to and downloaded from the cloud. And this is in addition to the burgeoning amount of user-generated content – which has increased 15-fold over the past few years, according to a presentation that Google VP Marissa Mayer made last August at Xerox PARC. Mayer said during her presentation that this “data explosion is bigger than Moore’s law.”

Read the Full Story.


Also posted in Video | Leave a comment

The Secret Life of Tape

Our favorite storage pundit Henry Newman writes that while tape is the best technology for long term data storage, you still need to be mindful of its life span:

Let me repeat: Tape does not have a standard framework to known information that is collected and analyzed. There are vendors that provide third-party products, and some tape library vendors support collection, but it is not a standard. This, in my opinion, is a big mistake for the tape drive vendors, as you cannot track the media or drive issues without specialize software.

Read the Full Story.


Also posted in Hardware, I/O, Tape | Leave a comment

TACC Researchers Forge Visual Archives of the Future

As our digital archives grow, the task of archivists has grown exponentially more complex. To help tackle these challenges, researchers at the Texas Advanced Computing Center are investigating different data archive analysis methods using a unique visualization framework.

Archival analysis is a multi-layered process and it is unique to each collection that is being assessed,” explained Maria Esteva, a digital archivist and data management and collections researcher at TACC. “We are conducting research to map analysis processes used by archivists onto a visualization that combines data driven analysis tools. In this way, the archivist can integrate his or her experience into the workflow.”

Visualizing big data is not something well suited to a small laptop display. TACC’s experts are currently building a multi-touch tiled display system to improve interactivity and to enhance the collaborative aspects of visual analysis for multiple users.

Technology research led by TACC today is yielding results that will be eventually integrated into the cyberinfrastructure of our country. At that point these technologies researched today will become commonplace,” said Robert Chadduck, Acting Director for the National Archives Center for Advanced Systems and Technologies. “In that way, TACC is providing what I believe is a window on the archives of the future.”

I haven’t seen this work at TACC myself, but I can tell you that looking at my storage through visualization utilities like Grandperspective can be a real eye opener. Read the Full Story.


Also posted in Hardware, HPC | Leave a comment

A New HPC Problem: Checksums for Large Archives

Storage pundit Henry Newman writes that running checksums for large data archives is quickly becoming an HPC problem:

Today, many preservation archives are well over 5PB and a few are well over 10PB with expectations that these archives will grow to more than 100PB. With archives this large, the requirements for HPC architectures for checksum validation are not much different than many of the standard HPC simulation problems, such as weather, crash, and other simulations.

I’ve always thought of large-scale archiving as an IO problem, but I was talking to Henry about this a few weeks ago and he described the monumental problem of validating archive data on a regular basis:

To validate the checksum for a file, the whole file must be read from disk or tape into memory and have the checksum algorithm applied to the data read and then compare the checksum that was just calculated to the stored checksum, which should be checksummed also so you are sure that you have a valid checksum to compare to the file you read into memory. With large archive systems, this is often an ongoing process whether the data resides on disk or tape, but checksum validation is particularly critical for disk-based archives with consumer-grade storage.

We tend to think of HPC devices as general-purpose number crunchers. It could be that the vendor who invents the better mousetrap for checkbit sums will be the next company to enjoy the big margins enjoyed by the supercomputing industry in the 80′s. Full Story


Also posted in HPC, Tape | Leave a comment

Advertisement

ClusterStor Ad

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap