TECH TIP: Streaming Data Analysis

For machine learning aficionados, this insideBIGDATA TECH TIP brings you a stimulating video presentation by data scientist John Myles White courtesy of the New York Open Statistical Programming Meetup group. The name of the talk is Streaming Data Analysis and Online Learning – traditional statistical software suites are specialized for analyzing data sets that can fit in RAM. But many modern applications require that we analyze much larger data sets. The talk surveys some of the basic methods for analyzing data in a streaming manner. The focus is on using stochastic gradient descent (SGD) to fit models to data sets that arrive in small chunks. The talk also discusses some basic implementation issues and demonstrates the effectiveness of SGD for problems like linear and logistic regression as well as matrix factorization. John also describes how these methods allow ML systems to adapt to user data in real-time.

John Myles White is one of the primary developers of Julia, a new language for technical computing. John is currently developing the statistical and machine learning infrastructure for Julia.  In addition, he is one of the residents at Hacker School’s Summer 2013 program. John recently finished his Ph.D. at Princeton, where he developed models of human decision-making. During grad school, John co-wrote Machine Learning for Hackers (an excellent resource). Starting in the fall, John will be a research scientist at Facebook.

 

 

Resource Links: