Running Hadoop Clusters on Supercomputers

Over at the San Diego Supercomputing Center, Glenn K. Lockwood writes that users of the Gordon supercomputer can use the myHadoop framework to dynamically provision Hadoop clusters within a traditional HPC cluster and run quick jobs.

For the purposes of testing mappers and reducers, doing a lot of smaller analyses, and debugging issues, I found that being able to establish a semi-persistent Hadoop cluster on a traditional HPC resource to be very useful in its own right. While one can feasibly do this on Amazon EC2, doing so is annoying and costs money (unlike XSEDE and FutureGrid, which are free). I wanted to just get a Hadoop cluster running so that I could prototype code and learn features, and the process is quite simple. This page describes how to create a semi-persistent Hadoop cluster on a traditional HPC resource (supercomputer), and by semi-persistent, I mean that the Hadoop cluster will run for as long as you tell it to rather than just for the lifetime of a single map/reduce job.

Read the Full Story.

Resource Links: