A statistical tribute to our waning group of brave Apollo astronauts! Sign up for the free insideBIGDATA newsletter.
Scientific research in the life sciences is often akin to searching for needles in haystacks. Finding the one protein, chemical, or genome that behaves or responds in the way the scientist is looking for is the key to the discovery process. For decades, high performance computing (HPC) systems have accelerated this process, often by helping to identify and eliminate in feasible targets sooner.
In this edition of insideBIGDATA’s Data Science 101 series, I’m going to offer up a short instructional video describing the use of the popular unsupervised learning algorithm, k-means clustering.
“Data Analytics Handbook” is a new resource meant to inform young professionals about the field of data science. Written by a group of students at UC Berkeley: Brian Liou, Tristan Tao, and Elizabeth Lin. Edition One of the book includes in-depth interviews with Data Scientists & Data Analysts.
MapR Technologies, Inc., provider of a leading distribution for Apache Hadoop, today announced a strategic partnership with Databricks and the addition of the complete Apache Spark technology stack to the MapR Distribution.
“The speed and flexibility of our core replicator solution and the companion Continuent Tungsten clustering solution offer advanced functionality in a simple and easily usable format. Tungsten Replicator supports high-speed replication between MySQL and Oracle databases in an open source product. Continuent Tungsten supports billions of transactions a day, with our largest single installation managing over 700 million transactions a day and over 225 terabytes of data. Key to all this is the ease of deployment and use, and the flexible nature of the solution, enabling cross-database replication, and advanced filtering not found in other products.”