<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Inside-BigData &#187; Hadoop</title>
	<atom:link href="http://inside-bigdata.com/category/hadoop/feed/" rel="self" type="application/rss+xml" />
	<link>http://inside-bigdata.com</link>
	<description>Discovering Gold with Big Data Analytics and Data-Intensive Computing</description>
	<lastBuildDate>Wed, 19 Jun 2013 12:00:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>Cray Rolls Out Hadoop Cluster Solution</title>
		<link>http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/</link>
		<comments>http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/#comments</comments>
		<pubDate>Thu, 13 Jun 2013 14:55:42 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Business of Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hardware]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3155</guid>
		<description><![CDATA[<p>Today Cray announced a new Hadoop solution that combines supercomputing technologies with an &#8220;enterprise-strength&#8221; approach to Big Data analytics. Available later this month, Cray cluster supercomputers for Hadoop will pair Cray CS300 systems with the Intel Distribution for Apache Hadoop. More and more organizations are expanding their usage of Hadoop software beyond just basic storage [...]</p><p>The post <a href="http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/">Cray Rolls Out Hadoop Cluster Solution</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cray.com/Products/BigData/CS300-Hadoop.aspx"><img alt="" src="http://www.cray.com/Assets/Images/products/cs300-ac.jpg" title="Cray CS300" class="alignright" width="178" height="129" /></a>Today Cray announced a new Hadoop solution that combines supercomputing technologies with an &#8220;enterprise-strength&#8221; approach to Big Data analytics. Available later this month, <a href="http://www.cray.com/Products/BigData/CS300-Hadoop.aspx">Cray cluster supercomputers for Hadoop</a> will pair Cray CS300 systems with the <a href="http://hadoop.intel.com/">Intel Distribution for Apache Hadoop</a>.</p>
<blockquote><p>More and more organizations are expanding their usage of Hadoop software beyond just basic storage and reporting. But while they’re developing increasingly complex algorithms and becoming more dependent on getting value out of Hadoop systems, they are also pushing the limits of their architectures,” said Bill Blake, senior vice president and CTO of Cray. “We are combining the supercomputing technologies of the Cray CS300 series with the performance and security of the Intel Distribution to provide customers with a turnkey, reliable Hadoop solution that is purpose-built for high-value Hadoop environments. Organizations can now focus on scaling their use of platform-independent Hadoop software, while gaining the benefits of important underlying architectural advantages from Cray and Intel.”</p></blockquote>
<p>As you may recall, Cray acquired the CS300 cluster technology from Appro last year. This gives the company a more affordable cluster offering for markets that don&#8217;t require Cray&#8217;s low latency interconnect technology. Read the <a href="http://investors.cray.com/phoenix.zhtml?c=98390&#038;p=irol-newsArticle&#038;ID=1829474&#038;highlight=">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/">Cray Rolls Out Hadoop Cluster Solution</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/cray-rolls-out-hadoop-cluster-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slidecast: MapR &#8211; Enterprise Grade NoSQL and Hadoop</title>
		<link>http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/</link>
		<comments>http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/#comments</comments>
		<pubDate>Thu, 13 Jun 2013 12:00:30 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Podcasts]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3147</guid>
		<description><![CDATA[<p>In this slidecast, Jack Norris from MapR Technologies presents: Enterprise Grade NoSQL and Hadoop. MapR M7 provides an enterprise-grade NoSQL solution that has tremendous scale advantages,&#8221; said John Schroeder, CEO and co-founder, MapR Technologies. &#8220;The fact that it is built-in with Hadoop is a game changer for organizations looking for the best platform to leverage [...]</p><p>The post <a href="http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/">Slidecast: MapR &#8211; Enterprise Grade NoSQL and Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><iframe width="511" height="383" src="http://www.youtube.com/embed/0xZ1kF6RukY?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>In this slidecast, Jack Norris from <a href="http://mapr.com">MapR Technologies</a> presents: <em>Enterprise Grade NoSQL and Hadoop</em>.</p>
<blockquote><p>MapR M7 provides an enterprise-grade NoSQL solution that has tremendous scale advantages,&#8221; said John Schroeder, CEO and co-founder, MapR Technologies. &#8220;The fact that it is built-in with Hadoop is a game changer for organizations looking for the best platform to leverage Big Data and support the broadest set of mission-critical applications.&#8221;</p></blockquote>
<p><a href="https://archive.org/download/MapRPodcast/MapR%20Podcast.mp3">Download the MP3</a> * <a href="http://www.slideshare.net/insideHPC/mapr">View the slides</a> * <a href="http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=275928198">Subscribe on iTunes</a>* <a href="http://feeds.feedburner.com/SunRadioHpcPodcast">Subscribe to RSS</a></p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/">Slidecast: MapR &#8211; Enterprise Grade NoSQL and Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/slidecast-mapr-enterprise-grade-nosql-and-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop Meets Lustre &#8211; Intel Rolls out Big Data Distribution for the Enterprise</title>
		<link>http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/</link>
		<comments>http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/#comments</comments>
		<pubDate>Wed, 12 Jun 2013 18:36:06 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Business of Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[Lustre]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3142</guid>
		<description><![CDATA[<p>Today Intel announced the &#8220;first converged HPC and Big Data plaform&#8221; with the new Intel Enterprise Edition for Lustre. Paired with Chroma storage management tools from Whamcloud as well as a new adaptor for the Intel Distribution for Apache Hadoop, the new offering provides enterprise-class reliability combined with HPC performance for Big Data applications. Enterprise [...]</p><p>The post <a href="http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/">Hadoop Meets Lustre &#8211; Intel Rolls out Big Data Distribution for the Enterprise</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://newsroom.intel.com/community/intel_newsroom/blog/2013/06/12/intel-expands-software-portfolio-for-big-data-solutions"><img alt="" src="https://dl.dropboxusercontent.com/u/5192443/lustrehadoop.jpg" title="Lustre and Hadoop" class="alignright" width="231" height="117" /></a>Today Intel announced the &#8220;first converged HPC and Big Data plaform&#8221; with the new <a href="http://newsroom.intel.com/community/intel_newsroom/blog/2013/06/12/intel-expands-software-portfolio-for-big-data-solutions">Intel Enterprise Edition for Lustre</a>. Paired with Chroma storage management tools from Whamcloud as well as a new adaptor for the <a href="http://hadoop.intel.com/">Intel Distribution for Apache Hadoop</a>, the new offering provides enterprise-class reliability combined with HPC performance for Big Data applications.</p>
<blockquote><p>Enterprise users are looking for cost-effective and scalable tools to efficiently manage and quickly access large volumes of data to turn valuable information into actionable insight,” said Boyd Davis, vice president and general manager of Intel’s Datacenter Software Division. “The addition of the Intel Enterprise Edition for Lustre to our big data software portfolio will help make it easier and more affordable for businesses to move, store and process data quickly and efficiently.”</p></blockquote>
<p>When paired with the Intel Distribution for Apache Hadoop, the Intel Enterprise Edition for Lustre software allows Hadoop to be run on top of Lustre, significantly improving speed in which data can be accessed and analyzed. This allows users to access data files directly from the global file system at faster rates and speeds up analytics time, providing more productive use of storage assets as well as simpler storage management.</p>
<p>The Intel Enterprise Edition for Lustre will be available in early in the third quarter of this year. Read the <a href="http://newsroom.intel.com/community/intel_newsroom/blog/2013/06/12/intel-expands-software-portfolio-for-big-data-solutions">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/">Hadoop Meets Lustre &#8211; Intel Rolls out Big Data Distribution for the Enterprise</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/hadoop-meets-lustre-intel-rolls-out-big-data-distribution-for-the-enterprise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slidecast: Postgres &#8211; A Tipping Point?</title>
		<link>http://inside-bigdata.com/slidecast-postgres-a-tipping-point/</link>
		<comments>http://inside-bigdata.com/slidecast-postgres-a-tipping-point/#comments</comments>
		<pubDate>Tue, 11 Jun 2013 12:00:51 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Podcasts]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3132</guid>
		<description><![CDATA[<p>In this slidecast, Ed Boyajian from EnterpriseDB presents: Postgres &#8211; A Tipping Point? PostgreSQL is the #1 enterprise-class open source database with a feature set comparable to the major proprietary RDBMS vendors and a customer list that spans every industry. EnterpriseDB&#8217;s Postgres Plus solutions let you confidently develop and deploy PostgreSQL-backed applications that scale all [...]</p><p>The post <a href="http://inside-bigdata.com/slidecast-postgres-a-tipping-point/">Slidecast: Postgres &#8211; A Tipping Point?</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><iframe width="511" height="383" src="http://www.youtube.com/embed/9ItlX0f2uL4?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>In this slidecast, Ed Boyajian from <a href="http://enterprisedb.com">EnterpriseDB</a> presents: <em>Postgres &#8211; A Tipping Point?</em></p>
<blockquote><p>PostgreSQL is the #1 enterprise-class open source database with a feature set comparable to the major proprietary RDBMS vendors and a customer list that spans every industry. EnterpriseDB&#8217;s Postgres Plus solutions let you confidently develop and deploy PostgreSQL-backed applications that scale all the way from embedded solutions to massive OLTP and data warehouse systems that serve thousands of users.&#8221;</p></blockquote>
<p><a href="http://www.slideshare.net/insideHPC/postgres-survey-podcast">View the slides</a> * <a href="http://bit.ly/19VOuL1">Download the MP3</a> * <a href="http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=275928198">Subscribe on iTunes</a> * <a href="http://feeds.feedburner.com/SunRadioHpcPodcast">Subscribe to RSS</a></p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/slidecast-postgres-a-tipping-point/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/slidecast-postgres-a-tipping-point/">Slidecast: Postgres &#8211; A Tipping Point?</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/slidecast-postgres-a-tipping-point/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimizing Hadoop for Intel Architecture</title>
		<link>http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/</link>
		<comments>http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/#comments</comments>
		<pubDate>Sat, 08 Jun 2013 16:49:21 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3122</guid>
		<description><![CDATA[<p>Over at The Data Stack, Intel&#8217;s Tim Allen writes that the key to optimizing Hadoop on x86 is to tune the underlying Java so that it takes advantage of capabilities in Intel hardware. When you do that, you can expect to see up to 70 percent faster performance on Hadoop sort operations. Hadoop spawns a [...]</p><p>The post <a href="http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/">Optimizing Hadoop for Intel Architecture</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://communities.intel.com/community/datastack/blog/2013/06/05/how-to-optimize-hadoop-performance-on-intel%C3%A2-architecture"><img alt="" src="http://i4.ytimg.com/vi/cWceZeC9ONo/mqdefault.jpg" title="Tim Allen" class="alignright" width="160" height="90" /></a>Over at <em>The Data Stack</em>, Intel&#8217;s <a href="http://communities.intel.com/people/timallen1234">Tim Allen</a> writes that the key to optimizing Hadoop on x86 is to tune the underlying Java so that it takes advantage of capabilities in Intel hardware. When you do that, you can expect to see up to 70 percent faster performance on Hadoop sort operations.</p>
<blockquote><p>Hadoop spawns a new Java Virtual Machine* (JVM) for each MapReduce function on each slave node. This means that a large analytics job can result in the creation of thousands of individual JVMs. Because Hadoop does not share memory resources across nodes, each JVM and Java service must perform optimally. Reduced performance on any single node can hamper data analytics performance across the cluster.</p></blockquote>
<p>Read the <a href="http://communities.intel.com/community/datastack/blog/2013/06/05/how-to-optimize-hadoop-performance-on-intel%C3%A2-architecture">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/">Optimizing Hadoop for Intel Architecture</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/optimizing-hadoop-for-intel-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TACC&#8217;s Hadoop Cluster Makes Big Data Research More Accessible</title>
		<link>http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/</link>
		<comments>http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/#comments</comments>
		<pubDate>Wed, 05 Jun 2013 12:00:57 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3114</guid>
		<description><![CDATA[<p>Over at the Texas Advanced Computing Center, Aaron Dubrow writes that researchers are using a specialized cluster at TACC to do experimental Hadoop-style studies on a current production system. This system offers researchers a total of 48, eight-processor nodes on TACC&#8217;s Longhorn cluster to run Hadoop in a coordinated way with accompanying large-memory processors. A [...]</p><p>The post <a href="http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/">TACC&#8217;s Hadoop Cluster Makes Big Data Research More Accessible</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.tacc.utexas.edu/image/image_gallery?img_id=775570&#038;t=1369176346399"><img alt="" src="http://www.tacc.utexas.edu/image/image_gallery?img_id=775570&#038;t=1369176346399" title="Longhorn supercomputer" class="alignright" width="175" height="262" /></a>Over at the <a href="http://www.tacc.utexas.edu/news/feature-stories/2013/hip-hip-hadoop">Texas Advanced Computing Center</a>, Aaron Dubrow writes that researchers are using a specialized cluster at TACC to do experimental Hadoop-style studies on a current production system.</p>
<blockquote><p>This system offers researchers a total of 48, eight-processor nodes on TACC&#8217;s Longhorn cluster to run Hadoop in a coordinated way with accompanying large-memory processors. A user on the system can request all 48 nodes for a maximum of 96 terabytes (TB) of distributed storage. What&#8217;s special about the Longhorn cluster at TACC isn&#8217;t simply the beefed-up hardware for running Hadoop; rather it&#8217;s the ability for researchers to leverage the vast compute capabilities of the center, including powerful visualization and data analysis systems, to further their investigations. The end-to-end research workflow enabled by TACC could not be done anywhere else, and as a bonus, researchers get access to the full suite of tools available at the center to do computational research. </p></blockquote>
<p>According to TACC Research Associate Weijia Xu, the best part is that Hadoop is easy to use without requiring users to be experts. It handles a lot of the low-level computing behavior, so people don’t need to have a lot of knowledge about I/O or memory structures to get started.</p>
<p>Read the <a href="http://www.tacc.utexas.edu/news/feature-stories/2013/hip-hip-hadoop">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/">TACC&#8217;s Hadoop Cluster Makes Big Data Research More Accessible</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/taccs-hadoop-cluster-makes-big-data-research-more-accessible/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slidecast: A Practical Introduction to Hadoop</title>
		<link>http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/</link>
		<comments>http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/#comments</comments>
		<pubDate>Mon, 03 Jun 2013 12:00:38 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Podcasts]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3101</guid>
		<description><![CDATA[<p>In this slidecast, Alex Gorbachev from Pythian presents a Practical Introduction to Hadoop. This is a great primer for viewers who want to get the big picture on how Hadoop works with Big Data and how this approach differs from relational databases. View the slides * Download the MP3 * Subscribe on iTunes * Subscribe to RSS &#160;</p><p>The post <a href="http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/">Slidecast: A Practical Introduction to Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><iframe width="511" height="383" src="http://www.youtube.com/embed/Zr5s9P_w7CI?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>In this slidecast, <a href="http://www.pythian.com/about/team/leadership/alex-gorbachev/">Alex Gorbachev</a> from <a href="http://www.pythian.com">Pythian</a> presents a <em>Practical Introduction to Hadoop</em>. This is a great primer for viewers who want to get the big picture on how Hadoop works with Big Data and how this approach differs from relational databases.</p>
<p><a href="http://www.slideshare.net/insideHPC/practical-introduction-to-hadoop">View the slides</a> * <a href="https://archive.org/download/PracticalIntroToHadoop/Practical%20Intro%20to%20Hadoop.mp3">Download the MP3</a> * <a href="http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=275928198">Subscribe on iTunes</a> * <a href="http://feeds.feedburner.com/SunRadioHpcPodcast">Subscribe to RSS</a></p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/">Slidecast: A Practical Introduction to Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/slidecast-a-practical-introduction-to-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interview: Jim Vogt from Zettaset on Enterprise Security for Hadoop</title>
		<link>http://inside-bigdata.com/interview-zettasett/</link>
		<comments>http://inside-bigdata.com/interview-zettasett/#comments</comments>
		<pubDate>Mon, 27 May 2013 16:59:37 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Business of Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3065</guid>
		<description><![CDATA[<p>When Hadoop was originally conceived, it was all about sharing and as a result, security was not built in. But in today&#8217;s enterprise, Big Data represents the company&#8217;s jewels&#8211;assets that must be protected. To learn more about these issues, I caught up Jim Vogt, CEO of Zettaset. inside Big Data: What are the main security [...]</p><p>The post <a href="http://inside-bigdata.com/interview-zettasett/">Interview: Jim Vogt from Zettaset on Enterprise Security for Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>When Hadoop was originally conceived, it was all about sharing and as a result, security was not built in. But in today&#8217;s enterprise, Big Data represents the company&#8217;s jewels&#8211;assets that must be protected. To learn more about these issues, I caught up Jim Vogt, CEO of <a href="http://www.zettaset.com/">Zettaset</a>.</p>
<p><em><strong>inside Big Data:</strong> What are the main security issues with Hadoop distributions?</em></p>
<p><a href="http://www.zettaset.com/company/management.php"><img class="alignright" title="Jim Vogt" src="http://assets.bizjournals.com/sanjose/news/JimVogt_Headshot*304.jpg?v=1" alt="" width="152" height="170" /></a><strong>Jim Vogt: </strong>Hadoop, like many open source technologies such as UNIX and TCP/IP, was not created with security in mind. While the open source Hadoop community supports some security features –like Kerberos, the use of firewalls and basic HDFS permissions – these security features aren&#8217;t a mandatory requirement for a Hadoop cluster, making it possible for an organization to run entire clusters without deploying any security. At the same time, the distributed computing nature of Hadoop also presents a unique challenge – data is fluid in this type of environment, moving to and from different nodes and sometimes data is sliced into fragments and shared across multiple servers. This makes it incredibly difficult to secure with traditional security approaches. Finally, popular distributions of Hadoop – like Cloudera, Hortonworks, etc. &#8211; have little incentive to build out their security functionality. The business model for most distributions is built around the sales of professional services and support– not software. The open source distribution model also requires that any new feature developments obtain the blessing of the open source community, and this means that open source solutions will always lag behind commercial software solutions from companies, like Zettaset.</p>
<p><em><strong>inside Big Data:</strong> To what degree is security a barrier to broader enterprise adoption of Hadoop and other big data technologies?</em></p>
<p><strong>Jim Vogt: </strong>Enterprises are obviously moving to adopt Hadoop and Big Data technologies, but right now they&#8217;re limiting their deployment of the technology throughout their enterprise. Primarily due to complexity, management challenges and security. Any organization that stores or transacts sensitive information is going to be subject to the same compliance mandates and data security regulations that apply to their traditional data stores. This can be a gating factor for organizations looking to make broader use of Hadoop and Big Data technologies.</p>
<p><em><strong>inside Big Data:</strong> Can&#8217;t Hadoop security be addressed with traditional perimeter security solutions and by compartmentalizing parts of the system behind firewalls?</em></p>
<p><strong>Jim Vogt: </strong>In short, no. Perimeter security solutions and compartmentalization are not designed to address Hadoop’s unique distributed architecture. However, this does not stop incumbent data security vendors from believing firewalls are the best option for Hadoop and distributed cluster security. Some firewalls attempt to map IP to actual AD credentials, yet this requires specific network design. Even with special network configuration, a firewall can only restrict access on an IP/port basis, while knowing nothing when it comes to the Hadoop File System or Hadoop itself. In order to control access, data administrators would have to segregate sensitive data on separate servers. This approach is not only inefficient but also fundamentally incompatible with distributed file systems like Hadoop, since files are constantly being shifted from server to server. It would require the creation of a second Hadoop cluster to contain sensitive data, and even then would only provide two levels of security for the data.</p>
<p><em><strong>inside Big Data:</strong> Does this mean that Big Data challenges can&#8217;t be met in a way that still meets with robust enterprise security requirements?</em></p>
<p><strong>Jim Vogt: </strong>The solution is to bring the security closer to the data, and apply it within the cluster itself.  This can be done using fine-grained access control such as RBAC and running it on every Hadoop node.  Using commercial software to automate the installation and management of RBAC simplifies deployment, and eliminates much of the complexity that security professionals currently face with open source Hadoop products.</p>
<p><em><strong>inside Big Data:</strong> How is Zettaset taking a different approach to security for Big Data?</em></p>
<p><strong>Jim Vogt: </strong>Unlike the dominant BigData players in the market &#8211; Cloudera, MapR, Hortonworks &#8211; Zettaset is an enterprise software company. We sell software that enables enterprises to quickly deploy, secure and scale Hadoop clusters. Our Orchestrator software is distribution-agnostic, which means we can work with the leading Apache Hadoop-based distributions available today. We harden Hadoop to address policy enforcement, regulatory compliance, access control, and risk management within the cluster environment, delivering the security capabilities that IT security professionals expect in any enterprise.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/interview-zettasett/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/interview-zettasett/">Interview: Jim Vogt from Zettaset on Enterprise Security for Hadoop</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/interview-zettasett/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>YARN to Spawn Big Data Breakthrough</title>
		<link>http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/</link>
		<comments>http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/#comments</comments>
		<pubDate>Sat, 25 May 2013 15:13:58 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[YARN]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3054</guid>
		<description><![CDATA[<p>Over at ReadWrite, Brian Proffitt writes that the coming release of Hadoop 2.0 will make information found within data warehouses and unstructured &#8220;data lakes&#8221; more accessible than ever. For Arun Murthy, the release manager for Hadoop 2.0, the most important change will be upgrading the MapReduce framework to Apache YARN, which will expand what software [...]</p><p>The post <a href="http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/">YARN to Spawn Big Data Breakthrough</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>Over at <em><a href="http://readwrite.com/2013/05/24/hadoop-20-yarn-bid-data-mapreduce">ReadWrite</a></em>, Brian Proffitt writes that the coming release of Hadoop 2.0 will make information found within data warehouses and unstructured &#8220;data lakes&#8221; more accessible than ever.</p>
<p><a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/yarn_architecture.gif"><img alt="" src="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/yarn_architecture.gif" title="Scheduler and Applications Manager" class="alignnone" width="510" height="316" /></a></p>
<blockquote><p>For Arun Murthy, the release manager for Hadoop 2.0, the most important change will be upgrading the MapReduce framework to Apache YARN, which will expand what software can be used in Hadoop and how much. Murthy, who is also YARN project lead and co-founder of Hortonworks, explained that &#8220;In Hadoop 1.0, everything was batch-oriented. In 2.0, you will now have multiple apps hitting the data inside all at once.&#8221; What YARN does, essentially, is divide the functionality of MapReduce even further, breaking the two major responsibilities of the MapReduce JobTracker component &#8211; resource management and job scheduling/monitoring &#8211; into separate daemons: a global ResourceManager and per-application ApplicationMaster.</p></blockquote>
<p>Read the <a href="http://readwrite.com/2013/05/24/hadoop-20-yarn-bid-data-mapreduce">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/">YARN to Spawn Big Data Breakthrough</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/yarn-to-spawn-big-data-breakthrough/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond Batch &#8211; Moving Hadoop to Multi App with Apache YARN</title>
		<link>http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/</link>
		<comments>http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/#comments</comments>
		<pubDate>Thu, 16 May 2013 21:32:56 +0000</pubDate>
		<dc:creator>Rich</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://inside-bigdata.com/?p=3013</guid>
		<description><![CDATA[<p>Over at the Hortonworks Blog, Arun Murthy writes that Apache Hadoop NextGen MapReduce (YARN) provides the highly-sought-after ability to run SQL in Hadoop. When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where [...]</p><p>The post <a href="http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/">Beyond Batch &#8211; Moving Hadoop to Multi App with Apache YARN</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html"><img alt="" src="http://hadoop.apache.org/images/hadoop-logo.jpg" title="Hadoop logo" class="alignright" width="300" height="71" /></a>Over at the <em><a href="http://hortonworks.com/blog/moving-hadoop-beyond-batch-with-apache-yarn/">Hortonworks Blog</a></em>, Arun Murthy writes that <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">Apache Hadoop NextGen MapReduce (YARN)</a> provides the highly-sought-after ability to run SQL in Hadoop.</p>
<blockquote><p>When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0.  By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.</p></blockquote>
<p>Read the <a href="http://hortonworks.com/blog/moving-hadoop-beyond-batch-with-apache-yarn/">Full Story</a>.</p>
<br /><div class="linkedInShareButton"><script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/"></script></div><div class="ad" style="padding-top: 10px; border-top: 1px dotted gray; padding-bottom: 5px; font-size: .95em;">&nbsp;</div><p>The post <a href="http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/">Beyond Batch &#8211; Moving Hadoop to Multi App with Apache YARN</a> appeared first on <a href="http://inside-bigdata.com">Inside-BigData</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://inside-bigdata.com/beyond-batch-moving-hadoop-to-multi-app-with-apache-yarn/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
