Data Science 101: Scalable Machine Learning with Apache Spark

In the presentation below, courtesy of the SF Machine Learning Meetup group in San Francisco, Xiangrui Meng introduces Spark and show how to use it to build fast, end-to-end machine learning workflows.

Data Science 101: Mining Big Data with Apache Spark

In this presentation and interactive demo, you’ll learn about data mining workflows, the architecture and benefits of Spark, as well as practical use cases for the framework.

Data Science 101: Data Agnosticism – Feature Engineering Without Domain Expertise

From the SciPy2013 conference, here is a compelling talk “Data Agnosticism: Feature Engineering Without Domain Expertise” by Nicholas Kridler of Accretive Health in Chicago.

Data Science 101: Apache YARN Usage Tips and Guidelines

Hadoop 2.0 YARN architecture

Hadoop YARN (Yet Another Resource Negotiator) is a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of user applications. YARN was added as part of Hadoop 2.0. Over the past several months of going to conferences like Hadoop Summit, attending big data Meetup groups like LA Big Data Users […]

Data Science 101: Building Brains to Understand the World’s Data

For this segment of insideBIGDATA Data Science 101, we have a very compelling Google Tech Talk “Building Brains to Understand the World’s Data” presented by Jeff Hawkins, co-founder of Numenta and who also founded Palm and Handspring.

Data Science 101: Parallel Iterative Deep Learning on Hadoop’s Next​-Gen YARN

deeplearning

Presented at the recent O’Reilly OSCON – Open Source Convention 2014 by Josh Patterson (Patterson Consulting) and Adam Gibson (Skymind.io) is “Introduction to Parallel Iterative Deep Learning on Hadoop’s Next​-Generation YARN Framework.”

Data Science 101: An Interview with Hadley Wickham

useR_logo

RStudio’s Chief Scientist Hadley Wickman was interviewed by DataScience.LA’s Eduardo Arino de la Rubia during the useR!2014 conference at UCLA this past July.

John Chambers: Interfaces, Efficiency and Big Data

In the video presentation below, industry luminary John Chambers makes a keynote presentation at the recent useR! 2014 conference, Interfaces, Efficiency and Big Data.

Databricks Cloud Announcement and Demo at Spark Summit 2014

Databricks

Coming to us from the recent Spark Summit 2014, here is a compelling presentation by Databricks CEO Ion Stoica that sets the stage for Spark’s continued advance in the big data ecosystem. The Databricks Cloud provides the full power of Spark, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

Data Science, Big Data and Statistics – can we all live together?

Here is a topic that receives much debate these days – as diverse fields like statistics, computer science and applied mathematics converge with newly named fields such as data science and big data. Can’t we all get along?