Discovering Gold with Big Data Analytics and Data-Intensive Computing

Video: MapReduce for the Masses using Common Crawl

In this video, Steve Salevan from the Common Crawl Foundation demonstrates how to go from having no prior experience with scale data analysis to being able to play with 40TB of web crawl information in just five minutes.

Common Crawl aims to change the big data game with our repository of over 40 terabytes of high-quality web crawl information into the Amazon cloud, the net total of 5 billion crawled pages. In this blog post, we’ll show you how you can harness the power of MapReduce data analysis against the Common Crawl dataset with nothing more than five minutes of your time, a bit of local configuration, and 25 cents.

Read the Full Story.



 

Like what you're reading? Come back every day for Inside-BigData news, or subscribe to email or RSS updates. Trackback URL: http://inside-bigdata.com/video-mapreduce-for-the-masses-using-common-crawl/trackback/

Leave your own comment

Advertisement

NAS for Dummies Ad

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap