The Future of Unstructured Data Workloads

unstructured

In this video from the OpenFabrics International Developer Workshop 2014, Tashneem Maistry from Pivotal presents: The Future of Unstructured Data Workloads.

Scaling HBase to Handle Massive Loads at Pinterest

As the latest installment of the Big Data Use Case series here on insideBIGDATA, we offer a compelling presentation by our friend , Jeremy Carroll, Operations engineer at Pinterest. Jeremy talks about how they use HBase at massive scale at Pinterest.

Slidecast: IBM Fellows Tackle Grand Challenges of Big Data

fellows

“A common thread for many of this year new IBM Fellows is their commitment to developing solutions and practical applications in the field of Big Data and Analytics. IBM is a leader in the space – with 1500 Big Data and Analytics-related patents in 2013 alone, and $24 billion in investments since 2005 through both acquisitions and R&D – and these fellows maintain the drumbeat of momentum that has made IBM number one in Big Data market share for the second year running.”

In Search of the Optimal Cheeseburger

If you’ve ever spent valuable billable hours time thinking about an algorithm to seek out the optimal cheeseburger, and calculate metrics like the maximal meat-to-bun ratio, then this presentation by noted data scientist Hilary Mason at the Ignite NYC event last year is for you. Hilary, a self-admitted cheeseburger lover, found some data sets in […]

Data Science 101: Hadoop for Analyzing Sentiment Data

This instructional video explores how to use Hadoop and the Hortonworks Data Platform to analyze sentiment data to understand how the public feels about a product launch – highlighted is the release of the film “Iron Man 3.”

Data Science 101: Metaprogramming Python for Big Data

datascientist2_featured

The video presentation below comes from our friends at the San Francisco Python Meetup group. The talk discusses how AdRoll uses Python to squeeze every last bit of performance out of a single high-end server for the purpose of interactive analysis of terabyte-scale data sets.

Quantum Machine Learning

Quantum_ML

Ever wonder what will happen when exabyte data stores are the norm, and even the parallelism of Hadoop can no longer provide the necessary processing power to address the data deluge? Quantum computing may hold the answer.

Data Science 101: Data Agnosticism

Bits are bits. Whether you are searching for whales in audio clips or trying to predict hospitalization rates based on insurance claims, the process is the same: clean the data, generate features, build a model, and iterate.

Data Science 101: k-means Clustering

In this edition of insideBIGDATA’s Data Science 101 series, I’m going to offer up a short instructional video describing the use of the popular unsupervised learning algorithm, k-means clustering.

DK Panda Presents: Big Data – Hadoop and Memcached

DK Panda

“As InfiniBand is getting used in scientific computing environments, there is a big demand to harness its benefits for enterprise environments for handling big data and analytics. This talk will focus on high-performance and scalable designs of Hadoop using native RDMA support of InfiniBand and RoCE. Designs for various components in Hadoop (such as HDFS, MapReduce, RPC, and HBASE) and their benefits based on the RDMA package for Apache Hadoop will be presented. RDMA-based design for scalable Memcached (used in Web 2.0) and the associated benefits will be presented.”