Wondering what to cook for Thanksgiving? This is a question asked all across the United States this time of year, but the answer often depends on where you live. Big data analysts at AllRecipes.com and Tableau Software worked to parse over 78 million recipe page views stretching from last Thanksgiving to uncover most popular foods by state.
Kaggle competitions using machine learning techniques have become the fascination of data scientists worldwide. In the video below, Jeremy Howard presents to the Melbourne R meetup group, where he gave a brief overview of his Data Scientist’s Toolbox (using a few Kaggle competitions as practical examples).
One of the closely-watched conundrums of online education is how best to approximate traditional on-premise teaching methods. Grading multiple choice and short answer exam questions is straightforward, but how do you approach grading student submitted essays in a way to allow the online platform to scale to handle tens of thousands of students? Enter machine learning.
I ran across a Tweet recently that pointed me to a discussion over on Professor Andrew Gelman’s blog, “Statistics is the least important part of data science.” Dr. Gelman is a Professor of Statistics and Political Science at Columbia University and prior Ph.D. adviser of Rachel Schutt, author of Doing Data Science which I reviewed earlier this month.
A very timely article recently appeared in Forbes that focused on several issues that may hold back big data from contributing to the bottom line in the near term, How Soon Will Big Data Yield Big Profits? This topic is on the mind of many companies in the process or contemplating the move into big data technology solutions.
MLDemos is a dandy open-source visualization tool for machine learning algorithms created to help studying and understanding how various algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. MLDemos is open-source and free for personal and academic use. Much insight […]
A very important technique in unsupervised machine learning as well as dimensionality reduction is Principal Component Analysis (PCA). But PCA is difficult to understand without the fundamental mathematical underpinnings. The two instructional videos below (Part 1 and 2) demonstrate PCA at an introductory level to provide an appreciation for this powerful tool used in big data applications.
A recent announcement appearing in MIT News, “Machine learning branches out,” highlights new research in probabilistic graphical models. In a paper being presented in December at the annual conference of the Neural Information Processing Systems Foundation, MIT researchers describe a new technique that expands the class of data sets whose structure can be efficiently deduced. […]