Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Hadoop”

Podcast: More than Big Data – Scott Gnau on the Teradata Unified Data Architecture

In this podcast, Scott Gnau from Teradata Labs discusses various aspects of Big Data and how the company’s Unified Data Architecture can position the enterprise to succeed.

* Download the MP3Subscribe on iTunes * If Dropbox is blocked, download audio from Google Drive.


Also posted in Analytics, Business of Big Data, Podcasts, Storage, Video | 1 Comment

Video: MapReduce Global: Why, How, Where?

In this video, Assistant Prof. Abhishek Chandra from Indiana University explores the potential of MapReduce outside of traditional configurations. Additional segments of this lecture are available on this IU YouTube Channel.


Also posted in Cloud, Video | Leave a comment

A Contrast of Paradigms – HPCC Systems & Hadoop

Flavio Villanustre writes about the differences between two powerful open source Big Data platforms: HPCC and Hadoop.

HPCC and Hadoop are both open source projects released under an Apache 2.0 license, and are free to use, with both leveraging commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying across this architecture. But this is where most of the similarities end.

  • Internode Communication. One of the significant limitations of the strict MapReduce model utilized by Hadoop, is the fact that internode communication is left to the Shuffle phase, which makes certain iterative algorithms that require frequent internode data exchange hard to code and slow to execute (as they need to go through multiple phases of Map, Shuffle and Reduce, each one of these representing a barrier operation that forces the serialization of the long tails of execution). In contrast, the HPCC Systems platform provide for direct inter-node communication at all times, which is leveraged by many of the high level ECL primitives.
  • Performance. Another disadvantage for Hadoop is the use of Java as the programming language for the entire platform, including the HDFS distributed filesystem, which adds for overhead from the JVM; in contrast, HPCC and ECL are compiled into C++, which executes natively on top of the Operating System, lending to more predictable latencies and overall faster execution (we have seen anywhere between 3 and 10 times faster execution on HPCC, compared to Hadoop, on the exact same hardware).

Read the Full Story.


Also posted in Analytics, HPCC, Software | 1 Comment

David Campbell of Microsoft on New Tools for Navigating Big Data

Over at the SQL Server Blog, David Campbell writes that this week Microsoft will disclose progress the company has made integrating with Hortonworks to broaden the adoption of Hadoop and help the community derive new insights from Big Data.

I will share some of the innovative work we’ve been doing both at Microsoft and with members of the Hadoop community to help customers unleash the value of their data by allowing more users to derive insights by combining and refining data regardless of the scale and complexity of data they are working with. We are working hard to broaden the adoption of Hadoop in the enterprise by bringing the simplicity and manageability of Windows to Hadoop based solutions, and we are expanding the reach with a Hadoop based service on Windows Azure. Hadoop is a great tool but, to fully realize the vision of a modern data platform, we also need a marketplace to search, share and use 1st and 3rd party data and services. And, to bring the power to everyone in the business, we need to connect the new big data ecosystem to business intelligence tools like PowerPivot and Power View.

Read the Full Story.


Also posted in Analytics, Business of Big Data, Software | Leave a comment

IBM Rolls Out Hadoop for Dummies

You can’t get too far in any discussion of Big Data without some mention of Hadoop, an open-source software framework that supports data-intensive distributed applications. Now IBM helps us mere mortals better understand this powerful tool with a free eBook on Hadoop for Dummies from author Robert D. Schneider.

Enterprises are using technolo- gies such as MapReduce and Hadoop to extract value from Big Data. The results of these efforts are truly mission-critical in size and scope. Properly deploying these vital solutions requires careful planning and evaluation when selecting a supporting infrastructure. In this book, we provide you with a solid understanding of key Big Data concepts and trends, as well as related architectures, such as MapReduce and Hadoop. We also present some suggestions about how to implement high-performance Hadoop.

Download the eBook (PDF).


Also posted in Analytics, Business of Big Data | 1 Comment

Univa Helps Archimedes Cut Hadoop Costs in Half

Today Univa announced that Archimedes is using Grid Engine distributed resource management software to operationalize a mission critical Hadoop application and reduce operating and deployment cost by 50 percent. Archimedes is a healthcare modeling organization that takes publicly available clinical data and uses it to answer complex, vital healthcare questions for researchers, pharmaceutical companies and government agencies. Through Univa’s Grid Engine unique and scalable solution, Archimedes was able to operate its Hadoop application on its current compute infrastructure, without the need to add additional resources or hardware.

Up until the time we had big data analytics, research using healthcare data took a significant amount of time and effort to analyze,” said Katrina Montinola, VP of Engineering at Archimedes. “The volume involved with big data is immense, and with the advancement of mathematics and computers, we are able to make analytical connections between data points, which may have otherwise been overlooked or minimized. With Univa Grid Engine, the complex analysis being completed by Archimedes’ solutions can be done quickly and made available to researchers and physicians in a convenient format that is informative and efficient.”

Read the Full Story.


Also posted in Business of Big Data, Software | Leave a comment

Panasas Chief Scientist on Where HPC Meets Big Data and Hadoop

In this video, Panasas Chief Scientist Garth Gibson describes how the company brings Hadoop support to bear in the world of Big Data and HPC.

Hadoop is a great platform for taking a gigantic amount of information and reducing it down to the central core that you then want to do the second level of analysis on. And that’s what’s happening across the enterprise, data warehousing, and HPC. So the fundamental issue is that after you’re done crunching with that commodity Hadoop cluster, you now have valuable assets. You want those valuable assets on a system you trust. You want it on a good, high-quality NAS. But it has to keep up. You need the high speed of a direct-flow environment. And then, it turns out, that once you can process from off-board quickly, you can optimize Hadoop and go faster in many cases because you’re using your off-board NAS.”

Recorded at SC12 in Salt Lake City. Read the Full Story.


Also posted in Business of Big Data, Hardware, Storage, Video | Leave a comment

Colfax and Mellanox Introduce Hadoop Appliance

This week Colfax unveiled the joint demonstration of a high performance Hadoop appliance over Mellanox end-to-end FDR 56Gb/s InfiniBand solution during SC12 at the Mellanox booth.

Colfax’s Hadoop appliance provides users with the best in class performance and ease of use in one consolidated solution. Featuring IntelXeon E5 processor based servers and Mellanox FDR 56Gb/s InfiniBand, it provides the needed throughput required for highest performance analytical tools based on the Apache Hadoop framework. The appliance approach provides customers a fast and trouble-free entry to the Hadoop analytics market. Scalability and performance are kept on a linear growth path, enabling customers to build various sized clusters.

Colfax Hadoop appliance will help customers overcome the perception that the big data analytics technology is hard to deploy and achieve business value,” says Gautam Shah, CEO of Colfax International. “With an ideal balance of performance, scalability and price, the Colfax Hadoop appliance enables organizations to overcome the barriers to big data analytics, providing a perfect platform to extract the most value from their data.”

Mellanox’s Unstructured Data Accelerator (UDA) enhances the Hadoop appliance performance by nearly 50 percent, providing data analysts with near real-time processing power. UDA brings RDMA capabilities into the Hadoop framework, enabling efficient data transfer and lower CPU overhead. The UDA package is an open source plug-in tool to the Apache Hadoop Map Reduce frame work.

Colfax and Mellanox will demonstrate the Hadoop appliance at SC12 in the Mellanox booth #1531. Read the Full Story.


Also posted in Business of Big Data, Hardware | Leave a comment

Video: Reducing Big Hardware Costs with Hadoop

In this video, Katrina Montinola from Archimedes describes how her company migrated their big data solution to Hadoop and the big unexpected benefit of reduced hardware costs.


Also posted in Business of Big Data, Video | 1 Comment

New Whitepaper on Univa Grid Engine Integration for Hadoop

Managing Hadoop clusters can be painful. To makes things easier, Univa is offering a whitepaper on Univa Grid Engine Integration for Hadoop.

Managing MapReduce applications in a shared infrastructure by integrating with Univa Grid Engine adds enterprise-level features and capabilities that do not exist otherwise. Hadoop assumes that all hosts are under its control and will not recognize that other workloads may be executing on the same server it may choose to place a job. This whitepaper will show you how to create an integration that creates a shared resource pool that supports Hadoop as well as any other workload submitted to Univa Grid Engine cluster.

Get the whitepaper.


Also posted in Business of Big Data, Software | Leave a comment

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap