When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.
In this slidecast, Justin Erickson from Cloudera presents a technical overview of Cloudera Impala, an SQL-on-Hadoop solution that enables users to do real-time queries of data stored in Hadoop clusters.
To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.
In this slidecast, the Radio Free HPC team interviews Fritz Ferstl, CTO of Univa. Topics include Big Data, HPC, and the continuing convergence of both.
While what we think of as traditional HPC may differ greatly from Big Data analytics, that seems to be changing. With a long history in high performance computing and customers in both worlds, Ferstl shares his unique perspective on where the two worlds overlap and where the potential is greatest for synergy in the future.
This has to be our best show yet, so be sure to check it out.
The growth of Hadoop and the hardware on which it runs has been increasing. Certainly it can be seen as a subset of HPC, offering a single yet powerful algorithm that has been optimized for a large number of commodity servers, with some crossover even into technical computing that could see further growth as things like YARN begin to give existing Hadoop clusters more HPC capabilities. Many companies are finding Hadoop to be the new Corporate HPC for big data.
Today Teradata announced that the new Enterprise Access for Hadoop and Unified Data Architecture enable business analysts to reach through Teradata directly into Hadoop to find new business value from the analysis of big, diverse data.
Today’s announcement of Teradata Enterprise Access for Hadoop is another example of our aggressive commitment to building out the Teradata Unified Data Architecture™,” said Scott Gnau, president, Teradata Labs. “Teradata Enterprise Access for Hadoop empowers organizations to dig deeply into files and data residing in Hadoop and combine the data with production business data for analyses – and action.”
Teradata Enterprise Access for Hadoop includes two new, innovative features that make access to data in Hadoop easy and secure for business analysts across the enterprise:
Teradata smart loader for Hadoop. For the first time, business analysts have point-and-click convenience to easily browse and move data between Teradata and Hadoop for analysis and self-service business intelligence.
Teradata SQL-H. The new Teradata SQL-H gives any user or application across the enterprise direct, on-the-fly access to data stored within Hadoop through standard ANSI SQL, leveraging the security, workload management, and performance of the Teradata data warehouse.
Today Xyratex announced that that the company is now a strategic supplier for AMD and their SeaMicro solutions for Big Data.
AMD will use Xyratex OneStor Modular Enclosure as one of the building blocks for its big data and storage intensive solutions and optimized the SeaMicro SM15000 server to provide more than five petabytes of storage capacity in two racks for big data applications such as Hadoop and Object Storage.
SeaMicro SM15000 server with the Freedom Fabric Storage solution is known in the market for its superior computing efficiency and storage density, as well as the lowest total cost of ownership,” said Dhiraj Mallick, Corporate Vice President and General Manager of Data Center Server Solutions at AMD. “With the combination of the SM15000 and the Xyratex OneStor data storage product, we have a winning solution that is unmatched in storage density and capacity.”
The combination of Xyratex and AMD products delivers an ultra-dense, high performance platform that eliminates excess hardware costs and cabling while simplifying installation and minimizing footprint requirements.
The internet, sensors and high performance computing are some of the top Big Data producers. Recently, there has been increased focus on extracting more value out of these generated data. Analysis of Big Data sets may be simplified as “looking for needle in a haystack” on one end of a spectrum to “looking for relationships between hay in a stack” on the other. We will discuss the architectural platforms and tools suitable for different parts of this spectrum.”
DDN has developed a Hadoop solution that is all about time to value: It simplifies rollout so that enterprises can get up and running more quickly, provides typical DDN performance to accelerate data processing, and reduces the amount of time needed to maintain a Hadoop solution.” said Dave Vellante, Chief Research Officer, Wikibon.org. “For enterprises with a deluge of data but a limited IT budget, the DDN hScaler appliance should be on the short list of potential solutions.”