Sign up for our newsletter and get the latest big data news and analysis.

Data Science 101: Machine Learning, Part 3


The “How Machine Learning Works” lecture series continues to build on Bayes rule that was taught last time. We’ll define training and testing data sets and build a Bayesian classifier. Specifically the statistical terms prior, likelihood and posterior are defined. We’ll use titles and product descriptions from a retailer and attempt to find the top level category under which the product is listed. The likelihood corresponds to per category word frequencies and the prior correspond to the number of products under each category. We’ll talk about running into implementation issues such as Laplacian smoothing, numerical instability, etc. This lecture builds a full classifier from scratch in both the design and complete Python implementation. This lecture is presented by BloomReach engineer Srinath Sridha.




Resource Links: