Sign up for our newsletter and get the latest big data news and analysis.

DK Panda Presents: Big Data – Hadoop and Memcached


DK Panda from Ohio State University presented this talk at the Stanford HPC & Exascale Conference. “As InfiniBand is getting used in scientific computing environments, there is a big demand to harness its benefits for enterprise environments for handling big data and analytics. This talk will focus on high-performance and scalable designs of Hadoop using native RDMA support of InfiniBand and RoCE.”

Trifacta Launches Data Transformation Platform to Clear Data Analysis Bottleneck

Data transformation platform provider Trifacta today announced the general availability of the Trifacta Data Transformation Platform.

Data Science 101: Interview with John Chambers

The father of the S language which ultimately became R, Dr. John Chambers, sits down with Professor Trevor Hastie of the Stanford University Statistics Department to discuss the long and fascinating history of the R language.

Real-time MapReduce Becomes a Reality


Data grid software leader ScaleOut Software is using in-memory computing to achieve operational analytics for real-time decision making. There are real benefits in support of in-memory technology when it comes to fraud alerts, transportation management, and taking advantage of short-lived financial opportunities.

Unsupervised Joke Generation from Big Data

Humor is a very human phenomenon. Can a machine appreciate humor? This reminds me of a scene from the 1994 movie “Star Trek: Generations” where the android Lt. Commander Data discovers humor. After having his emotion chip activated, Data finds everything amusing.

The Future of Computer Science


I am convinced we’re at an important inflection point in the timeline of the discipline of computer science. When compared to other disciplines like mathematics, physics and biology, computer science is a very young field, starting around 1964. But something is happening now, in 2014, that is propelling the field into a new evolutionary period.

Data Science 101: Machine Learning, Part 5

The “How Machine Learning Works” lecture series concludes by developing some machine learning python code from scratch. We use real valued numbers sampled from two different Gaussians with different priors.

Slidecast: Announcing the Altiscale Data Cloud


“Altiscale offers the first cloud service purpose-built to run Apache Hadoop. We run the latest version of Hadoop on custom infrastructure, augmented with Apache Hive, Pig, and Oozie, and with first-class support for Python, R, and Ruby. Altiscale’s infrastructure is faster, more reliable, easier to use, and more affordable than alternatives.”

The Future of Privacy in a Big Data World


Technologies that track data can make life more efficient, but can they go too far? If there’s a significant counterpoint to the big data stream roller, it is the potential for push back out of concerns for consumer privacy.

Data Science 101: Machine Learning, Part 4

The “How Machine Learning Works” lecture series continues by building on top of the Bayesian classifier developed in Part 3 of the series. We’ll build an expectation-maximization (EM) algorithm that locally maximizes the likelihood function.