Data Science 101: SparkR – Interactive R Programs at Scale

R + RDD = R2D2

R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, SparkR was announced earlier this year in a blog post. SparkR is an open source R package developed at U.C. Berkeley AMPLab that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell.

HP Invests $50 Million in Hortonworks

hortonworks

HP and Hortonworks announced a strategic partnership to address the critical big data needs of enterprise customers. The joint commitment will help accelerate the adoption of Enterprise Apache Hadoop by deeply integrating the Hortonworks Data Platform with the HP HAVEn big data platform. The partnership is also supported by a $50 million equity investment by HP.

Zettaset Simplifies Big Data Security and Management for the Enterprise

Zettaset, a leader in Big Data security, today announced it is making Zettaset Orchestrator’s cornerstone Big Data security capabilities available as stand-alone product offerings.

MaxCDN Builds Real Time Analytics Platform on Tokutek TokuMX

MaxCDN-Logo

Tokutek®, delivering database performance at scale, today announced that MaxCDN, a low-cost, high reliability content delivery network (CDN), has deployed TokuMX™ as the foundation to its real time analytics platform.

MapR Partners with Tata Consultancy Services to Help Customers with Big Data

MapR Logo - New 2014_FEATURE

Tata Consultancy Services (TCS), (BSE: 532540, NSE: TCS), a leading IT services, consulting and business solutions organization, has announced a new partnership with MapR Technologies, Inc., provider of the highly ranked distribution for Apache™ Hadoop®, to help enterprise customers easily and rapidly capture critical big data insights.

Walking Then Running

In this special guest feature, Jesse Anderson from Cloudera writes about how many new companies, like the ones we see popping up in the Hadoop ecosystem, too quickly move from crawling to running, a process that sometimes leads to failure.

FoundationDB and DataRPM Partner to Simplify Analytics & Visualization on Next Generation Database

foundationdb

FoundationDB, the company behind next generation database software that uniquely combines scalability and consistency, today announced a technology partnership with DataRPM, the industry pioneer in cognitive big data discovery & analytics platform. The integration allows users to perform data analysis, discovery and natural language analytics on data stored in FoundationDB’s distributed, transactional database (via the SQL Layer).

Teradata Acquires Revelytix and Hadapt

teradata_logo_mi

Teradata (NYSE: TDC), the analytic data platforms, marketing applications, and services company, today announced two acquisitions that accelerate the growth of its big data capabilities. On July 16th, Teradata acquired assets of Revelytix, a leader in information management products for big data with unique metadata management technology and deep expertise in integrating information across the enterprise. On July 17th, Teradata acquired assets of Hadapt, including experienced big data technologists and intellectual property.

Data Science 101: Real-time Analytics using Cassandra, Spark and Shark

In the video below, Evan Chan (Software Engineer at Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.

Project Adam: a New Deep-Learning System

Project_Adam

Project Adam is a new deep-learning system modeled after the human brain that has greater image classification accuracy and is 50 times faster than other systems in the industry. Project Adam is an initiative by Microsoft researchers and engineers that aims to demonstrate that large-scale, commodity distributed systems can train huge deep neural networks effectively.