Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “MapReduce”

Radio Free HPC Fireside Chat – HPC Embraces Big Data

In this slidecast, the Radio Free HPC team interviews Fritz Ferstl, CTO of Univa. Topics include Big Data, HPC, and the continuing convergence of both.

While what we think of as traditional HPC may differ greatly from Big Data analytics, that seems to be changing. With a long history in high performance computing and customers in both worlds, Ferstl shares his unique perspective on where the two worlds overlap and where the potential is greatest for synergy in the future.

This has to be our best show yet, so be sure to check it out.

View the slides on Slideshare * Download the MP3 * Download the mobile video * Download 1024p Video * Subscribe on iTunes * RSS Feed


Also posted in Business of Big Data, Graph Computing, Hadoop, HPC, Podcasts, Video | Leave a comment

Video: Building Big Data Pipelines with OSS

In this video from the SpringOne 2012 event, Costin Leau presents: Building Big Data Pipelines with OSS.

Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline needs to be developed that incorporates and orchestrates many diverse technologies. A Hadoop focused data pipeline not only needs to coordinate the running of multiple Hadoop jobs (MapReduce, Hive, Pig or Cascading), but also encompass real-time data acquisition and the analysis of reduced data sets extracted into relational/NoSQL databases or dedicated analytical engines.


Also posted in Events, Video | Leave a comment

Paper: A Map-Reduce-Like System for Emerging Parallel Architectures

Can MapReduce be used as an effective means of processing data-intensive HPC workloads? In his dissertation from Ohio State University, Wei Jiang writes that one first needs to overcome with performance scaling, fault tolerance, and GPU acceleration support.

We performed a comparative study showing that the map-reduce processing style could cause significant overheads for a set of data mining applications. Based on the observation, we developed a map-reduce system with an alternate API (MATE) using a user-declaredreduction-object to be able to further improve the performance of map-reduce programs in multi-core environments. To address the limitation in MATE that the reduction object must fit in memory, we extended the MATE system to support the reduction object ofarbitrary sizes in distributed environments and apply it to a set of graph mining applications, obtaining better performance than the original graph mining library based on map-reduce.

Download the paper (PDF).


Also posted in Hadoop, HPC | Leave a comment

Is It Time for CDOs (Chief Data Officers)?

Michael Vizard explains that current IT culture is used to giving people access to only a finite amount of data. But new data management frameworks such as MapReduce and Hadoop make it possible to cost-effectively analyze large amounts of data. Many IT organizations don’t have the skills in place to master those technologies. This gap between the IT skills at hand and the desires of the business community is starting to create some tension, which could be resolved with the appointment of someone who will function as chief data scientist or officer.

One might argue that because chief information officers are theoretically in charge of information, this task would fall under their purview. But there is a world of difference between managing data and understanding the business value of that data; hence the need for a new class of business data specialists.

Read the Full Story


Also posted in Analytics, Business of Big Data, Hadoop | Leave a comment

Platform MapReduce Crosses Big Data and High-Performance Computing

On June 29, 2011, Platform Computing Platform announced the availability of Platform MapReduce, the industry’s first enterprise-class, distributed runtime engine for MapReduce applications. Built on the company’s core technologies, LSF and Symphony, Platform MapReduce enables businesses to focus on moving MapReduce applications into production by providing enterprise-class manageability and scale, high resource utilization and availability, ease of operation, multiple application support and an open distributed file system architecture, including immediate support for Hadoop Distributed File System (HDFS) and Appistry Cloud IQ.

“High-Performance Analytics – a SAS specialty – happens at the intersection of Big Data and High-Performance Computing. Our mutual customers have benefited from Platform’s expertise and unique capabilities to manage and support these complex, distributed clusters,” said Paul Kent, SAS Vice President of Platform Research and Development. “Platform MapReduce is a welcome addition to the rapidly evolving Hadoop ecosystem. Platform Computing can play a critical role in the evolution and adoption of Hadoop in the Enterprise.”

Read the Full Story.


Also posted in Software | Leave a comment

Advertisement

ClusterStor Ad

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap