Data Science 101: How to Build Big Data Pipelines

In the video presentation below from the SpringOne 2GX 2012 conference in Washington, DC, Costin Leau looks at the architecture of Big Data pipelines, the challenges ahead and how to build manageable and robust solutions using Open Source software such as Apache Hadoop, Hive, Pig, Spring for Apache Hadoop, Batch and Integration.

Interview: Pepperdata Spices Things up in the World of Hadoop

sean

“We give Hadoop the predictability it needs, let organizations see what it’s doing (with detailed usage metrics for every user, job, and task, in real time), and help organizations get the most out of their hardware investment. We are not for the organizations that have just entered into their first Hadoop project (because they don’t rely on it… yet). We are here for those who already rely on the business-critical data and functionality Hadoop can deliver.”

Hadoop Summit 2014 – San Jose

hadoop_summit_logo_feature

Hortonworks and Yahoo! are pleased to host the 7th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. This event, expanded now to three days, June 3-5, will feature many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture.

Informatica at Hadoop Summit

informatica-logo

Informatica Corporation (Nasdaq:INFA), a leading provider of data integration software, will debut Informatica Power Center Big Data Edition and Data Quality Big Data Edition running on the newly announced Hortonworks Data Platform (HDP) 2.1 at Hadoop Summit 2014.

Interview: Tarmin Manages Data Overload with Data Defined Storage

Shahbaz Ali

“GridBank provides a comprehensive information governance framework to help organizations meet compliance regulations for retention management and disposal, and to mitigate data related risk by using end-to-end data protection. The GridBank Metabase, a distributed metadata repository, enables enterprise search and discovery and provides integration for big data analytics tools for increased data insights.”

Big Data and Sustainability

Big_data_sustainability

The recent Big Ideas for Sustainable Prosperity research conference brought together some of the world’s preeminent environment & economy thinkers for a two day conference to share knowledge and think big about Policy Innovation for Greening Growth. In the video presentation below, Dr. Matthew E. Kahn argues that the combination of Big Data and field experiments can sharply improve urban quality of life.

2014 Data Scientist Salary Survey

Data_science_salary_survey_2014

An important new research study was recently released that well-serves the needs of the data science professional community – the Burtch Works Study: Data Science Professionals Report. The free report includes a complete overview of the data science profession.

Data Science 101: Hadoop – Just the Basics for Big Data Rookies

Hadoop_elephants

With the Hadoop Summit conference coming next week (June 3-5), it might be useful for all newbies to get up to speed with this exciting distributed computing technology. Below is a video presentation that will open doors for you about the Hadoop technology that’s taking the enterprise by storm.

Interview: Spectra Logic on Big Data Storage in the Cloud

Kevin Dudak

“Big Data puts new requirements on storage with respect to scalability, data integrity and cost efficiency – and Spectra is well-positioned to serve this market. Our archive and backup data storage tape products support all aspects of secondary storage, are compatible with every major tape and disk format, enable massive scalability and provide a plethora of advanced features that ensure the data is protected, its integrity is maintained and that it will be available virtually forever. Our suite of T-Series tape libraries offer high capacity TS1140 and open standard LTO media options, have the capability to offer block, file and object storage on our tape systems; and can deliver long-term storage for under $0.10/GB LIST pricing.”

Data Science 101: Machine Learning Class at CMU

Here is a great learning resource for anyone wishing to dive into the field of machine learning – a complete class “Machine Learning” from Spring 2011 at Carnegie Mellon University. The course is taught by Tom Mitchell, Chair of the Machine Learning Department.