Data Science 101: Apache YARN Usage Tips and Guidelines

Hadoop 2.0 YARN architecture

Hadoop YARN (Yet Another Resource Negotiator) is a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of user applications. YARN was added as part of Hadoop 2.0. Over the past several months of going to conferences like Hadoop Summit, attending big data Meetup groups like LA Big Data Users […]

Data Science 101: Building Brains to Understand the World’s Data

For this segment of insideBIGDATA Data Science 101, we have a very compelling Google Tech Talk “Building Brains to Understand the World’s Data” presented by Jeff Hawkins, co-founder of Numenta and who also founded Palm and Handspring.

Data Science 101: Parallel Iterative Deep Learning on Hadoop’s Next​-Gen YARN

deeplearning

Presented at the recent O’Reilly OSCON – Open Source Convention 2014 by Josh Patterson (Patterson Consulting) and Adam Gibson (Skymind.io) is “Introduction to Parallel Iterative Deep Learning on Hadoop’s Next​-Generation YARN Framework.”

Data Science 101: An Interview with Hadley Wickham

useR_logo

RStudio’s Chief Scientist Hadley Wickman was interviewed by DataScience.LA’s Eduardo Arino de la Rubia during the useR!2014 conference at UCLA this past July.

John Chambers: Interfaces, Efficiency and Big Data

In the video presentation below, industry luminary John Chambers makes a keynote presentation at the recent useR! 2014 conference, Interfaces, Efficiency and Big Data.

Databricks Cloud Announcement and Demo at Spark Summit 2014

Databricks

Coming to us from the recent Spark Summit 2014, here is a compelling presentation by Databricks CEO Ion Stoica that sets the stage for Spark’s continued advance in the big data ecosystem. The Databricks Cloud provides the full power of Spark, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

Data Science, Big Data and Statistics – can we all live together?

Here is a topic that receives much debate these days – as diverse fields like statistics, computer science and applied mathematics converge with newly named fields such as data science and big data. Can’t we all get along?

Data Science 101: SparkR – Interactive R Programs at Scale

R + RDD = R2D2

R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, SparkR was announced earlier this year in a blog post. SparkR is an open source R package developed at U.C. Berkeley AMPLab that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell.

Data Science 101: Real-time Analytics using Cassandra, Spark and Shark

In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.

Project Adam: a New Deep-Learning System

Project_Adam

Project Adam is a new deep-learning system modeled after the human brain that has greater image classification accuracy and is 50 times faster than other systems in the industry. Project Adam is an initiative by Microsoft researchers and engineers that aims to demonstrate that large-scale, commodity distributed systems can train huge deep neural networks effectively.