R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, SparkR was announced earlier this year in a blog post. SparkR is an open source R package developed at U.C. Berkeley AMPLab that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell.
In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.
Project Adam is a new deep-learning system modeled after the human brain that has greater image classification accuracy and is 50 times faster than other systems in the industry. Project Adam is an initiative by Microsoft researchers and engineers that aims to demonstrate that large-scale, commodity distributed systems can train huge deep neural networks effectively.
FIELD REPORT Last week I attended the long-anticipated useR!2014 international conference at the UCLA campus, my alma mater. The four day event had something for everyone in attendance – all the brain cycles centered around the use of the R statistical environment. Since R is a primary tool for my work in data science and […]
As the data world undergoes its Cambrian explosion phase, our data tools need to become more advanced to keep pace. Deep Learning has emerged as a key tool in the non-linear arms race of machine learning. In the video below Josh Patterson and Adam Gibson take a look at how we can parallelize Deep Belief Networks in Deep Learning on Hadoop’s next generation YARN framework with Iterative Reduce.