Data Science 101: How to Build Big Data Pipelines

In the video presentation below from the SpringOne 2GX 2012 conference in Washington, DC, Costin Leau looks at the architecture of Big Data pipelines, the challenges ahead and how to build manageable and robust solutions using Open Source software such as Apache Hadoop, Hive, Pig, Spring for Apache Hadoop, Batch and Integration.

Interview: Pepperdata Spices Things up in the World of Hadoop

sean

“We give Hadoop the predictability it needs, let organizations see what it’s doing (with detailed usage metrics for every user, job, and task, in real time), and help organizations get the most out of their hardware investment. We are not for the organizations that have just entered into their first Hadoop project (because they don’t rely on it… yet). We are here for those who already rely on the business-critical data and functionality Hadoop can deliver.”

Teradata Portfolio for Hadoop 2 Announced

Teradata (NYSE: TDC), the analytic data platforms, marketing applications, and services company, today introduced Teradata Portfolio for
Hadoop 2 to reduce the risk, cost, and complexity of Hadoop deployment and management. The comprehensive portfolio helps organizations address the technical and business challenges of leveraging diverse data stored in Apache™ Hadoop®.

Hadoop Summit 2014 – San Jose

hadoop_summit_logo_feature

Hortonworks and Yahoo! are pleased to host the 7th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. This event, expanded now to three days, June 3-5, will feature many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture.

InfiniDB Demonstrates its SQL-on-Hadoop Analytic Query Engine at Hadoop Summit

Infinidb_logo

InfiniDB announced it will be showcasing its SQL-on-Hadoop query engine at the Hadoop Summit, June 3-5 in San Jose.

Informatica at Hadoop Summit

informatica-logo

Informatica Corporation (Nasdaq:INFA), a leading provider of data integration software, will debut Informatica Power Center Big Data Edition and Data Quality Big Data Edition running on the newly announced Hortonworks Data Platform (HDP) 2.1 at Hadoop Summit 2014.

Data Science 101: Hadoop – Just the Basics for Big Data Rookies

Hadoop_elephants

With the Hadoop Summit conference coming next week (June 3-5), it might be useful for all newbies to get up to speed with this exciting distributed computing technology. Below is a video presentation that will open doors for you about the Hadoop technology that’s taking the enterprise by storm.

Interview: Concurrent Leads the Way in Application Building on Hadoop

Gary-Nakamura-1713631-220

“Concurrent is the team behind Cascading, the proven application development framework that makes it possible for enterprises to leverage their existing skill sets for building data-oriented applications on Hadoop. Cascading has built-in attributes that make data application development a reliable and repeatable process. Companies that standardize on Cascading can build data applications at any scale, integrate them with existing systems, employ test-driven development practices and simplify their applications’ operational complexity.”

Splice Machine Launches Public Beta of its Hadoop RDBMS

Splicemachine

Splice Machine, provider of the only Hadoop RDBMS, today announced that its real-time relational database management system is now available as a public beta. The Splice Machine database enables companies to replace traditional RDBMS systems that are too costly or difficult to scale. A full-featured, transactional SQL database on Hadoop, Splice Machine moves Hadoop beyond its batch analytics heritage to power operational applications and real-time analytics.

Data Science 101: Hadoop and Object-based Dispersed Storage

Rob McCammon, Director of Product Management, makes a compelling case for using Cleversafe to combine the power of Hadoop MapReduce with a highly scalable Object-based Dispersed Storage System. This solution is designed to decrease infrastructure costs for separate servers dedicated to analytical processes, reducing required storage capacity (having a single copy of the data instead […]