Heritage Provider Network (HPN) presented a $500,000 award to the winning team, POWERDOT, for the Heritage Health Prize competition powered by the Kaggle machine learning challenge platform. The competition ran from April 4, 2011 until April 4, 2013.
Improving your skills as a data scientists can be facilitated by reviewing the work of other professionals in the field. Having Kaggle as a resource means you can closely examine winning solutions of past machine learning competitions. Here is a lecture by Phil Brierley, a 3 time Kaggle winner and the “P” in POWERDOT, the winning team of the now completed Heritage Health Prize ($3 million grand prize).
The Python language is gaining steam in the data science and machine learning communities. But as with getting up to speed with any programming language for the fist time, you’ll experience some growing pains. Here is a welcoming video tutorial called “Python Epiphanies” presented at PyCon 2012 in San Jose by Stuart Williams.
In this talk, Sean Gourley examines this world of augmented intelligence and shows how our understanding of the human brain is shaping the way we visualize and interact with big data. Gourley argues that the world we are living in is too complex for any single human mind to understand and that we need to team up with machines to make better decisions.
AWS watch out! A new gunslinger is in town with the recent general availability of the Google Compute Engine. This new platform could make a significant difference in the big data arena, affecting offerings like machine-learning-as-a-service and Hadoop-as-a-service.
From our friends at The Wall Street Journal Tech, we have a short video explaining “What is a Data Scientist.” Being interviewed by WSJ Europe Technology Editor Ben Rooney is Dr. DJ Patil, currently Data Scientist in Residence at Greylock Partners. As you’ll see, his main criteria for being a good data scientist are being […]
The Coursera Data Analysis course recently completed its latest 8 week session. I was delighted to serve as Community TA for the class and I believe the attendees received extraordinary value for their time spent learning this important subject.
The R statistical environment is renowned for struggling with large data sets. To resolve that limitation, HP Labs and HP Vertica have developed Distributed R, a scalable and high-performance platform for the R language. It splits tasks between multiple processing nodes (cores or nodes of a cluster) to vastly reduce execution time and gives users […]