Databricks Cloud Announcement and Demo at Spark Summit 2014

Databricks

Coming to us from the recent Spark Summit 2014, here is a compelling presentation by Databricks CEO Ion Stoica that sets the stage for Spark’s continued advance in the big data ecosystem. The Databricks Cloud provides the full power of Spark, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

Data Science, Big Data and Statistics – can we all live together?

Here is a topic that receives much debate these days – as diverse fields like statistics, computer science and applied mathematics converge with newly named fields such as data science and big data. Can’t we all get along?

Predictive Analytics for Big Data Using EmcienPatterns

emcien_logo

EmcienPatterns is Emcien’s Data Analysis Platform, providing complete and automated data analysis by revealing the patterns in data, analyzing those connections, and delivering answers to the user or to downstream systems through APIs.

In-Memory Computing: Three Myths That Could Put Your Business at Risk

Eric Frenkiel_MemSQL

In this special guest feature, Eric Frenkiel, Co-founder and CEO, MemSQL writes about the three myth surrounding in-memory computing and how companies that don’t take advantage of IMC risk being left behind.

Manifest Insights: A Single Pane of Glass for your Data

manifest

Manifest Insights is an exciting new Startup from Portland. “We are a data consulting and visualization company. We help companies gather together data from all the different sources where it may be and bring it together in a both easy to use and powerful dashboard, where they can slice and dice and view the data.”

Data Science 101: SparkR – Interactive R Programs at Scale

R + RDD = R2D2

R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, SparkR was announced earlier this year in a blog post. SparkR is an open source R package developed at U.C. Berkeley AMPLab that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell.

MapR Partners with Tata Consultancy Services to Help Customers with Big Data

MapR Logo - New 2014_FEATURE

Tata Consultancy Services (TCS), (BSE: 532540, NSE: TCS), a leading IT services, consulting and business solutions organization, has announced a new partnership with MapR Technologies, Inc., provider of the highly ranked distribution for Apache™ Hadoop®, to help enterprise customers easily and rapidly capture critical big data insights.

Data Science 101: Real-time Analytics using Cassandra, Spark and Shark

In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.

Project Adam: a New Deep-Learning System

Project_Adam

Project Adam is a new deep-learning system modeled after the human brain that has greater image classification accuracy and is 50 times faster than other systems in the industry. Project Adam is an initiative by Microsoft researchers and engineers that aims to demonstrate that large-scale, commodity distributed systems can train huge deep neural networks effectively.

The Putnam Mathematical Competition’s Unsolved Problem

Math_blackboard

As a data scientist with my roots in the theoretical foundations of the field, I’m always looking for ways to challenge myself and pick up a new mathematical apparatus that could help me in my project work.