Sign up for our newsletter and get the latest big data news and analysis.

Data Science 101: Data Agnosticism

Bits are bits. Whether you are searching for whales in audio clips or trying to predict hospitalization rates based on insurance claims, the process is the same: clean the data, generate features, build a model, and iterate.

Data Science 101: k-means Clustering

In this edition of insideBIGDATA’s Data Science 101 series, I’m going to offer up a short instructional video describing the use of the popular unsupervised learning algorithm, k-means clustering.

Netflix Reveals All (well, at least a lot)

netflixlogo

Last night I had the distinct pleasure of attending a Data Science Track event sponsored by the LA Machine Learning meetup group: Data Science @ Netflix.

Extending the R Language to the Enterprise

TIBCO_meetup

Earlier this week, I attended a very informative event sponsored by the LA RUG (Los Angeles R User Group meetup) that featured the topic “Extending the R language to the enterprise with TERR & Spotfire.”

Experfy Launches Data Science Marketplace

Experfy_feature

Experfy, based at the Harvard Innovation Lab, announced that it has launched a paradigm-changing, online marketplace that will allow industry leaders to solve their Big Data talent needs. Enterprises now have a central platform for on-demand hiring of vetted experts with algorithmic skills and domain knowledge, primarily for short-term projects related to data, analytics and business intelligence.

Data Science 101: Forecasting Time Series Using R

An integral tool found in data science is Time Series Forecasting. Here is a useful instructional video on the subject from one of the authors of a free eBook available on OTexts – “Forecasting: Principles and Practice.” The presentation “Forecasting Time Series Using R” is made by Professor of Statistics Rob J Hyndman.

An Overview of Hulu’s Data Platform

Hulu-Logo

Last night I attended the Los Angeles Hadoop users Group (LA-HUG) meeting hosted by Shopzilla. The topic for the evening was “An Overview of Hulu’s Data Platform” presented by Prasan Samtani and Tristan Reid of Hulu. From all indications, Hulu is a significant player in the Hadoop user community and this talk documented the team’s command of big data technology.

Intel’s Boyd Davis Talks Predictive Analytics and March Madness

boydDavis

“Intel’s goal is to encourage more innovative and creative uses for data as well as to demonstrate how big data and analytics technologies are impacting many facets of our daily lives, including sports. For example, coaches and their staffs are using real-time statistics to adjust games on-the-fly and throughout the season. From intelligent cameras to wearable sensors, a massive amount of data is being produced that, if analyzed in real-time, can provide a significant competitive advantage. Intel is among those making big data technologies more affordable, available, and easier to use for everything from helping develop new scientific discoveries and business models to even gaining the upper hand on good-natured predictions of sporting events.”

Data Science 101: Examining the Requests Made by the Top 100 Sites

File Types Correlation Plot

For our latest installment of the insideBIGDATA Data Science 101 series, I thought I’d do something a bit different. Here is a sample analysis by data scientist and blogger Dan Goldin who published some nice results using R to assess the web requests originating from the top 100 Internet sites.

StatAce for Cloud Statistics with R

StatAce

StatAce is a start-up in cloud based data science and statistical computing offering an online graphical R environment which allows you to quickly and easily analyze large data. It facilitates collaboration with others, tracks the changes to your R scripts, and can be accessed from any device.