In this video, Assistant Prof. Abhishek Chandra from Indiana University explores the potential of MapReduce outside of traditional configurations. Additional segments of this lecture are available on this IU YouTube Channel.
HPCC and Hadoop are both open source projects released under an Apache 2.0 license, and are free to use, with both leveraging commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying across this architecture. But this is where most of the similarities end.
- Internode Communication. One of the significant limitations of the strict MapReduce model utilized by Hadoop, is the fact that internode communication is left to the Shuffle phase, which makes certain iterative algorithms that require frequent internode data exchange hard to code and slow to execute (as they need to go through multiple phases of Map, Shuffle and Reduce, each one of these representing a barrier operation that forces the serialization of the long tails of execution). In contrast, the HPCC Systems platform provide for direct inter-node communication at all times, which is leveraged by many of the high level ECL primitives.
- Performance. Another disadvantage for Hadoop is the use of Java as the programming language for the entire platform, including the HDFS distributed filesystem, which adds for overhead from the JVM; in contrast, HPCC and ECL are compiled into C++, which executes natively on top of the Operating System, lending to more predictable latencies and overall faster execution (we have seen anywhere between 3 and 10 times faster execution on HPCC, compared to Hadoop, on the exact same hardware).
Read the Full Story.
Over at the SQL Server Blog, David Campbell writes that this week Microsoft will disclose progress the company has made integrating with Hortonworks to broaden the adoption of Hadoop and help the community derive new insights from Big Data.
I will share some of the innovative work we’ve been doing both at Microsoft and with members of the Hadoop community to help customers unleash the value of their data by allowing more users to derive insights by combining and refining data regardless of the scale and complexity of data they are working with. We are working hard to broaden the adoption of Hadoop in the enterprise by bringing the simplicity and manageability of Windows to Hadoop based solutions, and we are expanding the reach with a Hadoop based service on Windows Azure. Hadoop is a great tool but, to fully realize the vision of a modern data platform, we also need a marketplace to search, share and use 1st and 3rd party data and services. And, to bring the power to everyone in the business, we need to connect the new big data ecosystem to business intelligence tools like PowerPivot and Power View.
Read the Full Story.
You can’t get too far in any discussion of Big Data without some mention of Hadoop, an open-source software framework that supports data-intensive distributed applications. Now IBM helps us mere mortals better understand this powerful tool with a free eBook on Hadoop for Dummies from author Robert D. Schneider.
Enterprises are using technolo- gies such as MapReduce and Hadoop to extract value from Big Data. The results of these efforts are truly mission-critical in size and scope. Properly deploying these vital solutions requires careful planning and evaluation when selecting a supporting infrastructure. In this book, we provide you with a solid understanding of key Big Data concepts and trends, as well as related architectures, such as MapReduce and Hadoop. We also present some suggestions about how to implement high-performance Hadoop.
Today Univa announced that Archimedes is using Grid Engine distributed resource management software to operationalize a mission critical Hadoop application and reduce operating and deployment cost by 50 percent. Archimedes is a healthcare modeling organization that takes publicly available clinical data and uses it to answer complex, vital healthcare questions for researchers, pharmaceutical companies and government agencies. Through Univa’s Grid Engine unique and scalable solution, Archimedes was able to operate its Hadoop application on its current compute infrastructure, without the need to add additional resources or hardware.
Up until the time we had big data analytics, research using healthcare data took a significant amount of time and effort to analyze,” said Katrina Montinola, VP of Engineering at Archimedes. “The volume involved with big data is immense, and with the advancement of mathematics and computers, we are able to make analytical connections between data points, which may have otherwise been overlooked or minimized. With Univa Grid Engine, the complex analysis being completed by Archimedes’ solutions can be done quickly and made available to researchers and physicians in a convenient format that is informative and efficient.”
Read the Full Story.
In this video, Panasas Chief Scientist Garth Gibson describes how the company brings Hadoop support to bear in the world of Big Data and HPC.
Hadoop is a great platform for taking a gigantic amount of information and reducing it down to the central core that you then want to do the second level of analysis on. And that’s what’s happening across the enterprise, data warehousing, and HPC. So the fundamental issue is that after you’re done crunching with that commodity Hadoop cluster, you now have valuable assets. You want those valuable assets on a system you trust. You want it on a good, high-quality NAS. But it has to keep up. You need the high speed of a direct-flow environment. And then, it turns out, that once you can process from off-board quickly, you can optimize Hadoop and go faster in many cases because you’re using your off-board NAS.”
Colfax Hadoop appliance will help customers overcome the perception that the big data analytics technology is hard to deploy and achieve business value,” says Gautam Shah, CEO of Colfax International. “With an ideal balance of performance, scalability and price, the Colfax Hadoop appliance enables organizations to overcome the barriers to big data analytics, providing a perfect platform to extract the most value from their data.”
Mellanox’s Unstructured Data Accelerator (UDA) enhances the Hadoop appliance performance by nearly 50 percent, providing data analysts with near real-time processing power. UDA brings RDMA capabilities into the Hadoop framework, enabling efficient data transfer and lower CPU overhead. The UDA package is an open source plug-in tool to the Apache Hadoop Map Reduce frame work.
Colfax and Mellanox will demonstrate the Hadoop appliance at SC12 in the Mellanox booth #1531. Read the Full Story.
In this video, Katrina Montinola from Archimedes describes how her company migrated their big data solution to Hadoop and the big unexpected benefit of reduced hardware costs.
Managing Hadoop clusters can be painful. To makes things easier, Univa is offering a whitepaper on Univa Grid Engine Integration for Hadoop.
Managing MapReduce applications in a shared infrastructure by integrating with Univa Grid Engine adds enterprise-level features and capabilities that do not exist otherwise. Hadoop assumes that all hosts are under its control and will not recognize that other workloads may be executing on the same server it may choose to place a job. This whitepaper will show you how to create an integration that creates a shared resource pool that supports Hadoop as well as any other workload submitted to Univa Grid Engine cluster.