Data Quality, It’s Everybody’s Problem

Andrew Hermann, President & Director, CorSource Technology Group

In this special guest feature, Andrew Herman, President of CorSource, addresses data quality, a challenge facing all companies in the age of mass data collection. “Successfully tackling data quality is imperative, and achievable with a progressive, methodical approach. Your competitors are struggling with this very issue, and the question is whether this is going to remain your problem, or just theirs.”

Algorithmia: An Open Marketplace for Algorithms

Algorithmia

Seattle-based start-up Algorithmia is a new marketplace for algorithms. Developers can turn algorithms into scalable web services with a single click. Companies and application developers can integrate algorithms into their own applications with under 10 lines of code by accessing the Algorithmia universal API.

Gartner’s 2014 Hype Cycle for Emerging Technologies

Gartner_Hype_Cycle_2014

The Gartner Hype Cycle for Emerging Technologies just hit the streets! This guide to the industry’s pulse is a good way to balance the hype with reality. As you’ll note in the chart below, big data is entering the Trough of Disillusionment.

Interview: Nick Elprin, Co-founder, Domino

Domino_Nick

Domino Data Lab, Inc. is a new company that started out with a focus on enabling much easier cloud computation, and doing “version control for data science.” I caught up with co-founder Nick Elprin at the recent useR!2014 conference to get the high-level view of his company.

Sponsored Post: Intel Cloud Edition Available for Lustre Software

Lustreonaws2

Intel has collaborated with AWS to offer a Cloud Edition for Lustre Software that allows customers to use the power of the worlds’ most popular HPC storage system. It provides fast, scalable storage, optimizing servers for the workload they support.

Basho Introduces Riak CS 1.5

basho_logo2

Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, today introduced Riak CS 1.5 and Riak CS 1.5 Enterprise, Basho’s distributed object storage software.

Data Science 101: Real-time Analytics using Cassandra, Spark and Shark

In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.

Where There’s Spark There’s Fire: The State of Apache Spark in 2014

Matei Zaharia, CTO of Databricks and Creator of Apache Spark

In this special guest feature, Matei Zaharia, CTO of Databricks and Creator of Apache Spark, explores open-source Apache Spark ‘s status in the Hadoop community.

Book Reviews: The Bootstrap Resampling Technique

Bootstrap

In the spirit of the importance of bootstrap methods to contemporary machine learning, I’d like to review several prominent books on the subject. Some of the titles are relatively new, while others can be considered “classics.”

Guavus Enhances its Reflex Operational Intelligence Platform with Apache Spark and Hadoop YARN

guavus logo new

Guavus, a leading provider of big data analytics solutions for operational intelligence, has unveiled Reflex 2.0 with support for Apache Spark and Hadoop YARN. The Guavus Reflex™ Operational Intelligence Platform provides a real-time analysis across business and operations for better quality decision-making.