In this slidecast, Brian Christian from Zettaset describes how the company has productized Hadoop for ease of use, HPC speed, and enterprise security.
Our mission is to analyze more data, more quickly, and in a smaller footprint,” says Jim Vogt, CEO of Zettaset, Inc. “Working with Hyve Solutions and IBM, we’ve simplified big data for deployment on enterprise class commodity hardware.”
This week Pentaho announced that Dell will be begin reselling Pentaho Business Analytics, along with its servers and Apache Hadoop from Cloudera. The partnership brings to market an easy-to-deploy, low-risk “blended” big data solution at “an extremely competitive price point.”
The Dell, Pentaho and Cloudera solution will make it easier for companies to realize value more quickly than ever before with big data applications,” said Ed Albanese, head of business development at Cloudera. “Companies are seeking big data analytics from a single, trusted source with best-of-breed components. They want the hard work and hassle involved in installation, configuration management, and ongoing systems management taken care of for them. Adding Pentaho to the Dell Apache Hadoop Solution delivers on this market demand.”
In this video, the Rafael Coss, manager Big Data Enablement for IBM describes how MapReduce works in plain English.
MapReduce is a framework for processing embarrassingly parallel problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster (if all nodes are on the same local network and use similar hardware) or a grid (if the nodes are shared across geographically and administratively distributed systems, and use more heterogenous hardware). Computational processing can occur on data stored either in a filesystem (unstructured) or in a database (structured).
Over at Forbes, Tom Groenfeldt writes that while Asian organizations lag behind the U.S. and Europe in data warehouse, business intelligence and analytics investments, but don’t expect that to last says Phil Carter at IDC in Singapore.
Korea Telecom replaced an Oracle database with Hadoop after a five-year project to support 17 million telephone subscribers, 7.2 broadband subscribes and 15 million mobile users. The company needed more complex data analysis, especially of customer behavior. Beyond data volume, a key concern was data integration of structured and unstructured data. The company found that Hive was the best solution for a smooth transition from Oracle to Hadoop. From Hadoop, data was exported to an OLAP Server and Oracle for business intelligence and a data warehouse.
Alex Buttery writes that Hadoop skills are in demand, but that Academia is behind the ball.
Hadoop is one of the up-and-coming technologies for performing analysis on big data, but to date very few universities have included it in their undergraduate or graduate curriculums. In a February 2012 article from InfoWorld, those already using the technology issued the warning that “Hadoop requires extensive training along with analytics expertise not seen in many IT shops today.” A ComputerWorld article singled out MIT and UC Berkeley as having already added Hadoop training and experience to their curriculums. Other educational institutions need to seek out practitioners in their area or poll alumni to determine if individuals that can impart their knowledge to college students are available and if so, prepare a curriculum to start training the next generation of IT employees and imbue them with the skills they will require to meet the challenges of the 21st century.
This week SGI announced world record benchmark performance with full support for the newest Intel Xeon processor E5-2400 and E5-4600 product families. The E5-2400 is now the base processor in the SGI Hadoop Starter Kits and is available in the SGI Rackable product line for use in other applications.
Big Data is characterized not just by its volume but also by its velocity and variety. Moreover, Big Data can be in either structured or unstructured forms. These dynamics give rise to a broad range of demands made on a computer system, especially for high performance and comprehensive analytics,” said SGI CTO Dr. Eng Lim Goh. “Our long design relationship with Intel and the incorporation of the more robust Intel Xeon E5 processors, have enabled us to develop the next SGI coherent shared memory platform that scales up even higher, in compute, memory and IO, than our previous generation. The result is a system ideally suited to meet this broad spectrum of existing and emerging Big Data challenges.”
This week Mellanox announced the joint demonstration of PHAT-DATA40G, a high-performance Hadoop appliance over a Mellanox end-to-end 40GbE networks. As demonstrated during Interop 2012 at Mellanox booth, PHAT-DATA40G is a consolidated solution that provides Hadoop users with minimum 20 percent performance enhancement above existing available solutions.
PHAT-DATA40G is a collaborative effort with industry technology leaders to build the highest-performing Hadoop solution running on optimized hardware, with ease of use and simplified deployment as a shared vision,” said Jean Shih, President, AMAX. “Data is gold if it’s organized and analyzed correctly, and in highly competitive markets, this translates into reaching more revenue faster and on a larger scale, yet with precise incision. PHAT-DATA40G powers Hadoop with the most powerful and user-friendly engine on the market for this very goal, to give companies a real competitive edge quickly and dynamically using data-driven business intelligence.”
Quentin Hardy from the New York Times writes that Google’s Venture Capital arm is building an internal data sciences team that is at the center of its investment philosophy.
It should not be too surprising that a Google-created entity should have this bent. Google, along with Web pioneers like Yahoo and Amazon, was crucial to the creation of the emerging Big Data industry. By tracking things like consumer clicks and the behavior of thousands of computer servers working together, they amassed large volumes of data at a time when collapsing prices for data storage made it attractive to analyze. They also captured information from nontraditional sources, like e-mail, leading them to create so-called “unstructured” database software like Hadoop and MapReduce. Versions of those are now used to store and analyze other kinds of data.