Data Science 101: The Data Analytics Handbook

“Data Analytics Handbook” is a new resource meant to inform young professionals about the field of data science. Written by a group of students at UC Berkeley: Brian Liou, Tristan Tao, and Elizabeth Lin. Edition One of the book includes in-depth interviews with Data Scientists & Data Analysts.

Extending the R Language to the Enterprise


Earlier this week, I attended a very informative event sponsored by the LA RUG (Los Angeles R User Group meetup) that featured the topic “Extending the R language to the enterprise with TERR & Spotfire.”

Data Science 101: Hadoop in the Cloud


Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.

Data Science 101: Forecasting Time Series Using R

An integral tool found in data science is Time Series Forecasting. Here is a useful instructional video on the subject from one of the authors of a free eBook available on OTexts – “Forecasting: Principles and Practice.” The presentation “Forecasting Time Series Using R” is made by Professor of Statistics Rob J Hyndman.

Richard Feynman Computer Heuristics Lecture

Richard Feynman, winner of the 1965 Nobel Prize in Physics and world renown “curious character,” gives us an insightful lecture about computer heuristics: how computers work, how they file information, how they handle data, how they use their information in allocated processing in a finite amount of time to solve problems and how they actually compute values of interest to human beings.

Data Science 101: Examining the Requests Made by the Top 100 Sites

File Types Correlation Plot

For our latest installment of the insideBIGDATA Data Science 101 series, I thought I’d do something a bit different. Here is a sample analysis by data scientist and blogger Dan Goldin who published some nice results using R to assess the web requests originating from the top 100 Internet sites.

The Wolfram Programming Language for Data Science

Stephen Wolfram, founder of Wolfram Research and creator of Mathematica, just announced the new Wolfram Programming Language. This new knowledge-based language could be a game changer in data science.

Certona Predicts Consumer Behavior with Patented Technology


Certona, a leading provider of real-time omnichannel personalization for the many of the largest brands and retailers, today announced that the United States (US) Patent Office has issued the company a patent for representing and predicting human behavior.

Data Science 101: Deep Learning Methods and Applications


Microsoft Research, the research arm of the software giant, is a hotbed of data science and machine learning research. Microsoft has the resources to hire the best and brightest researchers from around the globe. A recent publication is available for download (PDF): “Deep Learning: Methods and Applications” by Li Deng and Dong Yu, two prominent researchers in the field.

Productionizing Hadoop: 7 Architectural Best Practices

Big Data will change the way your organization responds to business opportunities. But to reap its full benefits, you have to move from proof of concept into full production. Here is an informative, 52-minute presentation that provides the guidelines for successfully integrating Hadoop into your standard data center processes.