Sign up for our newsletter and get the latest big data news and analysis.

Stanford Statistical Learning


The new StatLearning course from Stanford University begins today. The free massively open online course (MOOC) is an excellent way to get up to speed with state-of-the-art machine learning by two of the foremost experts in the field: professors Trevor Hastie and Robert Tibshirani.

Fighting Cancer with Big Data


“Big data techniques offer a way to analyze data pooled across many patients: their specific disease mutations, biological markers, the treatments, and outcomes — in order to identify unexpected ways that existing therapies can be applied and combined to create personalized treatments that dramatically improve the chances of survival.”

Data Science 101: Machine Learning, Part 3

The “How Machine Learning Works” lecture series continues to build on Bayes rule that was taught last time. We’ll define training and testing data sets and build a Bayesian classifier.

Map-D: GPU-Powered Social Science Research in Real Time


“Map-D uses multiple NVIDIA GPUs to interactively query and visualize big data in real-time. Map-D is an SQL-enabled column store that generates 70-400X speedups over other in-memory databases. This talk discusses the basic architecture of the system, the advantages and challenges of running queries on the GPU, and the implications of interactive and real-time big data analysis in the social sciences and beyond.”

Slidecast: IBM Packs Flash into DIMM Slots with X6 Server Line


In this slidecast, Kevin Murray from IBM introduces the company’s new X6 line of servers. Able to accommodate solid-state Flash drives in their DIMM memory slots, the new systems are designed to deliver significant improvements in the performance and economics of x86-based systems for analytics and the cloud.

Leadspace Updates Predictive Lead Targeting


“Predictive lead targeting enables you to tap into the social conversations going on among individuals within your targeted companies, including job listings, news and more,” said Leadspace co-founder and VP Products Amnon Mishor. “Based on your Ideal Customer Profile, our automated scoring algorithm identifies the specific organizations that are likely the most open to hearing about your solution, thereby significantly increasing conversions.”

Watson and IBM’s Big Bet on Analytics


“IBM appears to be seeing about the same results as other companies pushing big data technology, such as Hadoop. There’s some money coming in, but it’s not yet a billion-dollar business. There’s potential for really big deals, but it probably means slogging through long proofs of concept and deployment cycles.”

Data Science 101: Machine Learning, Part 2

The “How Machine Learning Works” lecture series continues by building on fundamental definitions of statistics. This is needed for any rigorous analysis of models or machine learning algorithms.

Visualization of the Week: Sentence Drawing


This week’s top visualization is actually a new algorithmic technique for showing the aesthetic and organic beauty of language based on a very innovative use of R and the popular ggplot2 package.

ScaleOut Extends In-Memory Data Grid with C++ APIs


“With this release, C++ developers now can easily integrate ScaleOut’s IMDG into their applications to provide scalable performance, as well as parallel query and integrated real-time analytics for applications written in C++,” said Bill Bain, ScaleOut’s CEO. “The added capabilities in Version 5.1 also introduce significant enhancements to ScaleOut StateServer’s features and performance and broaden its availability in the cloud.”