Netflix Reveals All (well, at least a lot)

netflixlogoFIELD REPORT

Last night I had the distinct pleasure of attending a Data Science Track event sponsored by the LA Machine Learning meetup group: Data Science @ Netflix. Held at the new, much larger, Cross Campus location in Santa Monica, the event attracted 250 people with another hundred-plus on hand at a satellite location in Pasadena using a streaming video link. Presenting were Douglas Twisselmann, Ph.D., Senior Data Scientist, and Kevin Wylie, Director of Content Data Science, from the Netflix content team in Beverly Hills. Netflix has another data science group in Los Gatos, Calif.

The Netflix content team is tasked with the challenge of licensing/purchasing/developing the best TV and movies for its 44 million users in 41 countries. This talk covered an overview of what the content data science teams do for the organization towards the goals of identifying characteristics of an “ideal” content library, predicting demand for titles that Netflix does not have, determining the customer impact of adding or losing sets of content, and helping to identify the next original series. In addition, they covered some data and techniques one might use in demand prediction. Here is a slide describing the Netflix data pipeline:

Netflix_datapipeline

Netflix does it right with both a Data Science Engineering and Science & Algorithms group. They wisely have two distinct teams for engineering AND theoretical data science (mathematical statistics, probability theory, machine learning) instead of trying to hire unicorns like many other companies. The Netflix corporate culture also was discussed where “high performance” is valued above all else, i.e. you can be fired for being average. It sounds like a pressure cooker, but some people thrive on work environments like that.

One cool slide included in the presentation, and worth the price of admission in my opinion, was a list of machine learning technology Netflix uses in one form or another:

  • Regression models (logistic, linear, elastic nets)
  • GBDT/RF
  • SVD & other MF models
  • Factorization machines
  • Restricted Boltzmann machines
  • Markov Chains and other graphical models
  • Clustering (from k-means to HDP)
  • Deep ANN
  • LDA

Another slide had a group of academic books favored by the Netflix data science team and lo-and-behold I saw my favorite statistical learning book!

The Netflix data science guys were as candid as they were allowed to be with their insights into how the company maximizes their data assets, however there were a number of limitations to what they could talk about, especially how they utilize user rankings. But Netflex customers intrinsically know the company’s recommender systems are second to none. As one illustration of this, we were treated to some actual Tweets from customers after receiving a highly targeted e-mail (most companies don’t get such nice comments about an e-mail blast). One woman responded this way to an alert that another season of “The Office” was going to be available:

Netflix, you understand me better than any man has!”

I think this says it all in terms of the Netflix prowess in data science.

Daniel, Managing Editor — insideBIGDATA

 

Sign up for the free insideBIGDATA newsletter.

Comments

  1. Is there anywhere to view or download the presentation? Sounds amazing!

Resource Links: