Data Science 101: An Interview with Hadley Wickham

useR_logo

RStudio’s Chief Scientist Hadley Wickman was interviewed by DataScience.LA’s Eduardo Arino de la Rubia during the useR!2014 conference at UCLA this past July.

John Chambers: Interfaces, Efficiency and Big Data

In the video presentation below, industry luminary John Chambers makes a keynote presentation at the recent useR! 2014 conference, Interfaces, Efficiency and Big Data.

Databricks Cloud Announcement and Demo at Spark Summit 2014

Databricks

Coming to us from the recent Spark Summit 2014, here is a compelling presentation by Databricks CEO Ion Stoica that sets the stage for Spark’s continued advance in the big data ecosystem. The Databricks Cloud provides the full power of Spark, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

Data Science, Big Data and Statistics – can we all live together?

Here is a topic that receives much debate these days – as diverse fields like statistics, computer science and applied mathematics converge with newly named fields such as data science and big data. Can’t we all get along?

Data Science 101: SparkR – Interactive R Programs at Scale

R + RDD = R2D2

R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, SparkR was announced earlier this year in a blog post. SparkR is an open source R package developed at U.C. Berkeley AMPLab that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell.

Data Science 101: Real-time Analytics using Cassandra, Spark and Shark

In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.

Project Adam: a New Deep-Learning System

Project_Adam

Project Adam is a new deep-learning system modeled after the human brain that has greater image classification accuracy and is 50 times faster than other systems in the industry. Project Adam is an initiative by Microsoft researchers and engineers that aims to demonstrate that large-scale, commodity distributed systems can train huge deep neural networks effectively.

The Putnam Mathematical Competition’s Unsolved Problem

Math_blackboard

As a data scientist with my roots in the theoretical foundations of the field, I’m always looking for ways to challenge myself and pick up a new mathematical apparatus that could help me in my project work.

Where There’s Spark There’s Fire: The State of Apache Spark in 2014

Matei Zaharia, CTO of Databricks and Creator of Apache Spark

In this special guest feature, Matei Zaharia, CTO of Databricks and Creator of Apache Spark, explores open-source Apache Spark ‘s status in the Hadoop community.

Book Reviews: The Bootstrap Resampling Technique

Bootstrap

In the spirit of the importance of bootstrap methods to contemporary machine learning, I’d like to review several prominent books on the subject. Some of the titles are relatively new, while others can be considered “classics.”