In the video presentation below from the SpringOne 2GX 2012 conference in Washington, DC, Costin Leau looks at the architecture of Big Data pipelines, the challenges ahead and how to build manageable and robust solutions using Open Source software such as Apache Hadoop, Hive, Pig, Spring for Apache Hadoop, Batch and Integration.
“We give Hadoop the predictability it needs, let organizations see what it’s doing (with detailed usage metrics for every user, job, and task, in real time), and help organizations get the most out of their hardware investment. We are not for the organizations that have just entered into their first Hadoop project (because they don’t rely on it… yet). We are here for those who already rely on the business-critical data and functionality Hadoop can deliver.”
Hortonworks and Yahoo! are pleased to host the 7th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. This event, expanded now to three days, June 3-5, will feature many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture.
“GridBank provides a comprehensive information governance framework to help organizations meet compliance regulations for retention management and disposal, and to mitigate data related risk by using end-to-end data protection. The GridBank Metabase, a distributed metadata repository, enables enterprise search and discovery and provides integration for big data analytics tools for increased data insights.”
The recent Big Ideas for Sustainable Prosperity research conference brought together some of the world’s preeminent environment & economy thinkers for a two day conference to share knowledge and think big about Policy Innovation for Greening Growth. In the video presentation below, Dr. Matthew E. Kahn argues that the combination of Big Data and field experiments can sharply improve urban quality of life.
“Big Data puts new requirements on storage with respect to scalability, data integrity and cost efficiency – and Spectra is well-positioned to serve this market. Our archive and backup data storage tape products support all aspects of secondary storage, are compatible with every major tape and disk format, enable massive scalability and provide a plethora of advanced features that ensure the data is protected, its integrity is maintained and that it will be available virtually forever. Our suite of T-Series tape libraries offer high capacity TS1140 and open standard LTO media options, have the capability to offer block, file and object storage on our tape systems; and can deliver long-term storage for under $0.10/GB LIST pricing.”
Here is a great learning resource for anyone wishing to dive into the field of machine learning – a complete class “Machine Learning” from Spring 2011 at Carnegie Mellon University. The course is taught by Tom Mitchell, Chair of the Machine Learning Department.