Sign up for our newsletter and get the latest big data news and analysis.

Data Science 101: Interview with John Chambers

The father of the S language which ultimately became R, Dr. John Chambers, sits down with Professor Trevor Hastie of the Stanford University Statistics Department to discuss the long and fascinating history of the R language.

Real-time MapReduce Becomes a Reality


Data grid software leader ScaleOut Software is using in-memory computing to achieve operational analytics for real-time decision making. There are real benefits in support of in-memory technology when it comes to fraud alerts, transportation management, and taking advantage of short-lived financial opportunities.

Unsupervised Joke Generation from Big Data

Humor is a very human phenomenon. Can a machine appreciate humor? This reminds me of a scene from the 1994 movie “Star Trek: Generations” where the android Lt. Commander Data discovers humor. After having his emotion chip activated, Data finds everything amusing.

The Future of Computer Science


I am convinced we’re at an important inflection point in the timeline of the discipline of computer science. When compared to other disciplines like mathematics, physics and biology, computer science is a very young field, starting around 1964. But something is happening now, in 2014, that is propelling the field into a new evolutionary period.

Data Science 101: Machine Learning, Part 5

The “How Machine Learning Works” lecture series concludes by developing some machine learning python code from scratch. We use real valued numbers sampled from two different Gaussians with different priors.

Slidecast: Announcing the Altiscale Data Cloud


“Altiscale offers the first cloud service purpose-built to run Apache Hadoop. We run the latest version of Hadoop on custom infrastructure, augmented with Apache Hive, Pig, and Oozie, and with first-class support for Python, R, and Ruby. Altiscale’s infrastructure is faster, more reliable, easier to use, and more affordable than alternatives.”

The Future of Privacy in a Big Data World


Technologies that track data can make life more efficient, but can they go too far? If there’s a significant counterpoint to the big data stream roller, it is the potential for push back out of concerns for consumer privacy.

Data Science 101: Machine Learning, Part 4

The “How Machine Learning Works” lecture series continues by building on top of the Bayesian classifier developed in Part 3 of the series. We’ll build an expectation-maximization (EM) algorithm that locally maximizes the likelihood function.

Stanford Statistical Learning


The new StatLearning course from Stanford University begins today. The free massively open online course (MOOC) is an excellent way to get up to speed with state-of-the-art machine learning by two of the foremost experts in the field: professors Trevor Hastie and Robert Tibshirani.

Fighting Cancer with Big Data


“Big data techniques offer a way to analyze data pooled across many patients: their specific disease mutations, biological markers, the treatments, and outcomes — in order to identify unexpected ways that existing therapies can be applied and combined to create personalized treatments that dramatically improve the chances of survival.”