As the next installment of the insideBIGDATA TECH TIP series, I wanted to focus on the marriage of two important technologies to the data science community: the R statistical environment and the MongoDB database.
Kaggle competitions using machine learning techniques have become the fascination of data scientists worldwide. In the video below, Jeremy Howard presents to the Melbourne R meetup group, where he gave a brief overview of his Data Scientist’s Toolbox (using a few Kaggle competitions as practical examples).
One of the closely-watched conundrums of online education is how best to approximate traditional on-premise teaching methods. Grading multiple choice and short answer exam questions is straightforward, but how do you approach grading student submitted essays in a way to allow the online platform to scale to handle tens of thousands of students? Enter machine learning.
I ran across a Tweet recently that pointed me to a discussion over on Professor Andrew Gelman’s blog, “Statistics is the least important part of data science.” Dr. Gelman is a Professor of Statistics and Political Science at Columbia University and prior Ph.D. adviser of Rachel Schutt, author of Doing Data Science which I reviewed earlier this month.
MLDemos is a dandy open-source visualization tool for machine learning algorithms created to help studying and understanding how various algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. MLDemos is open-source and free for personal and academic use. Much insight […]
A very important technique in unsupervised machine learning as well as dimensionality reduction is Principal Component Analysis (PCA). But PCA is difficult to understand without the fundamental mathematical underpinnings. The two instructional videos below (Part 1 and 2) demonstrate PCA at an introductory level to provide an appreciation for this powerful tool used in big data applications.