Discovering Gold with Big Data Analytics and Data-Intensive Computing

Advertisement

Entries filed under “Software”

Slidecast: MapR – Enterprise Grade NoSQL and Hadoop

In this slidecast, Jack Norris from MapR Technologies presents: Enterprise Grade NoSQL and Hadoop.

MapR M7 provides an enterprise-grade NoSQL solution that has tremendous scale advantages,” said John Schroeder, CEO and co-founder, MapR Technologies. “The fact that it is built-in with Hadoop is a game changer for organizations looking for the best platform to leverage Big Data and support the broadest set of mission-critical applications.”

Download the MP3 * View the slidesSubscribe on iTunesSubscribe to RSS


Also posted in Analytics, Hadoop, Podcasts, Video | Leave a comment

Hadoop Meets Lustre – Intel Rolls out Big Data Distribution for the Enterprise

Today Intel announced the “first converged HPC and Big Data plaform” with the new Intel Enterprise Edition for Lustre. Paired with Chroma storage management tools from Whamcloud as well as a new adaptor for the Intel Distribution for Apache Hadoop, the new offering provides enterprise-class reliability combined with HPC performance for Big Data applications.

Enterprise users are looking for cost-effective and scalable tools to efficiently manage and quickly access large volumes of data to turn valuable information into actionable insight,” said Boyd Davis, vice president and general manager of Intel’s Datacenter Software Division. “The addition of the Intel Enterprise Edition for Lustre to our big data software portfolio will help make it easier and more affordable for businesses to move, store and process data quickly and efficiently.”

When paired with the Intel Distribution for Apache Hadoop, the Intel Enterprise Edition for Lustre software allows Hadoop to be run on top of Lustre, significantly improving speed in which data can be accessed and analyzed. This allows users to access data files directly from the global file system at faster rates and speeds up analytics time, providing more productive use of storage assets as well as simpler storage management.

The Intel Enterprise Edition for Lustre will be available in early in the third quarter of this year. Read the Full Story.


Also posted in Business of Big Data, Hadoop, HPC, Lustre | Leave a comment

Intel to Announce New Lustre Solution for the Enterprise

 
Today OpenSFS announced that Intel will roll out a new product on Wednesday called the Intel Enterprise Edition for Lustre.

We’re pleased to see Intel pushing Lustre in new directions and growing the Lustre ecosystem,” said Galen Shipman, OpenSFS Chairman. “This is a great example of the continued advancement and maturation of Lustre technology in the marketplace.”

Lustre has long been used in many of the world’s fastest supercomputing systems and is reportedly gaining much interest in areas such as banking, where speed and scalability are keys to success. If Intel can reduce the complexity of Lustre deployments and administration with the Chroma storage management technology it acquired with Whamcloud last year, the company may be able to sell Lustre into a whole new market.

Intel’s Moving Lustre Forward event starts at 11 am on June 12 at the Merchants Exchange Building in San Francisco. The event will also will be live-streamed (registration required).

Read the Full Story and Register now.


Also posted in Business of Big Data, Events, Impala | Leave a comment

Video: Xyratex Launches ClusterStor Professional Services for Lustre

In this video, Simon Johnson from Xyratex describes the company’s new suite of professional services for Lustre data storage.

Our mission for ClusterStor Professional Services is to accelerate the time to results for our users and partners across verticals and geographies, and help them realize value sooner from their investment in HPC storage technology,” said Simon Johnson, senior director of ClusterStor Professional Services at Xyratex. “Our team has a deep pedigree in value-added services, Lustre® file systems and data storage, and our ClusterStor solutions have achieved tremendous market adoption in a short time. Combined, this experience enables us to bring additional integrated solution value and efficiency to solve our clients’ challenges and further exploit the best-in-class performance that our technology delivers.”

Read the Full Story.


Also posted in Business of Big Data, HPC, Lustre, Storage, Video | Leave a comment

Canonical and Inktank Offering Full Support for Ceph with OpenStack

This week Canonical and Inktank announced a collaboration to provide a fully integrated and supported implementation of Ceph storage technology with OpenStack on Ubuntu. While Ceph has been packaged as part of Ubuntu for some time, this new agreement enables Canonical customers who have purchased Ubuntu Advantage Cloud Infrastructure to get commercial-grade production support for both OpenStack and Ceph from Canonical. Ubuntu Advantage subscribers will also get Ceph support backed by Inktank as part of their subscription.

We are working with many OpenStack users who are looking for comprehensive, sophisticated, and cost effective storage solution options,” said Mark Shuttleworth, Founder of Ubuntu and Canonical. “Ceph is fast becoming the favoured choice for next-generation storage solutions with OpenStack because it is the only software-defined storage software that provides both object and block storage fully integrated with OpenStack. This partnership with Inktank enables Canonical to provide support for the leading cloud and storage technologies to our customers as a single, integrated solution.”

Ceph will be delivered alongside OpenStack releases via package archives and will be installed and managed using Juju, Canonical’s innovative service orchestration and deployment tool. Adding Ceph to an existing OpenStack cloud is made simple using the Juju GUI or command line interfaces. Read the Full Story.


Also posted in Ceph, Cloud, Storage | Leave a comment

Interview: Jim Vogt from Zettaset on Enterprise Security for Hadoop

When Hadoop was originally conceived, it was all about sharing and as a result, security was not built in. But in today’s enterprise, Big Data represents the company’s jewels–assets that must be protected. To learn more about these issues, I caught up Jim Vogt, CEO of Zettaset.

inside Big Data: What are the main security issues with Hadoop distributions?

Jim Vogt: Hadoop, like many open source technologies such as UNIX and TCP/IP, was not created with security in mind. While the open source Hadoop community supports some security features –like Kerberos, the use of firewalls and basic HDFS permissions – these security features aren’t a mandatory requirement for a Hadoop cluster, making it possible for an organization to run entire clusters without deploying any security. At the same time, the distributed computing nature of Hadoop also presents a unique challenge – data is fluid in this type of environment, moving to and from different nodes and sometimes data is sliced into fragments and shared across multiple servers. This makes it incredibly difficult to secure with traditional security approaches. Finally, popular distributions of Hadoop – like Cloudera, Hortonworks, etc. – have little incentive to build out their security functionality. The business model for most distributions is built around the sales of professional services and support– not software. The open source distribution model also requires that any new feature developments obtain the blessing of the open source community, and this means that open source solutions will always lag behind commercial software solutions from companies, like Zettaset.

inside Big Data: To what degree is security a barrier to broader enterprise adoption of Hadoop and other big data technologies?

Jim Vogt: Enterprises are obviously moving to adopt Hadoop and Big Data technologies, but right now they’re limiting their deployment of the technology throughout their enterprise. Primarily due to complexity, management challenges and security. Any organization that stores or transacts sensitive information is going to be subject to the same compliance mandates and data security regulations that apply to their traditional data stores. This can be a gating factor for organizations looking to make broader use of Hadoop and Big Data technologies.

inside Big Data: Can’t Hadoop security be addressed with traditional perimeter security solutions and by compartmentalizing parts of the system behind firewalls?

Jim Vogt: In short, no. Perimeter security solutions and compartmentalization are not designed to address Hadoop’s unique distributed architecture. However, this does not stop incumbent data security vendors from believing firewalls are the best option for Hadoop and distributed cluster security. Some firewalls attempt to map IP to actual AD credentials, yet this requires specific network design. Even with special network configuration, a firewall can only restrict access on an IP/port basis, while knowing nothing when it comes to the Hadoop File System or Hadoop itself. In order to control access, data administrators would have to segregate sensitive data on separate servers. This approach is not only inefficient but also fundamentally incompatible with distributed file systems like Hadoop, since files are constantly being shifted from server to server. It would require the creation of a second Hadoop cluster to contain sensitive data, and even then would only provide two levels of security for the data.

inside Big Data: Does this mean that Big Data challenges can’t be met in a way that still meets with robust enterprise security requirements?

Jim Vogt: The solution is to bring the security closer to the data, and apply it within the cluster itself. This can be done using fine-grained access control such as RBAC and running it on every Hadoop node. Using commercial software to automate the installation and management of RBAC simplifies deployment, and eliminates much of the complexity that security professionals currently face with open source Hadoop products.

inside Big Data: How is Zettaset taking a different approach to security for Big Data?

Jim Vogt: Unlike the dominant BigData players in the market – Cloudera, MapR, Hortonworks – Zettaset is an enterprise software company. We sell software that enables enterprises to quickly deploy, secure and scale Hadoop clusters. Our Orchestrator software is distribution-agnostic, which means we can work with the leading Apache Hadoop-based distributions available today. We harden Hadoop to address policy enforcement, regulatory compliance, access control, and risk management within the cluster environment, delivering the security capabilities that IT security professionals expect in any enterprise.


Also posted in Business of Big Data, Hadoop, Security | Leave a comment

Video: Big Data Security with DataStax and Gazzang

In this slidecast, panelists describe how software-based data security can help you protect the sensitive and confidential information in your online transactional business applications while remaining in compliance.

Speakers

  • Greg Greenstreet, VP of Engineering for Gnip
  • Robin Shumacher, VP of Products for DataStax
  • Sam Heywood, VP of Products for Gazzang

Also posted in Business of Big Data, Security, Video | Leave a comment

YARN to Spawn Big Data Breakthrough

Over at ReadWrite, Brian Proffitt writes that the coming release of Hadoop 2.0 will make information found within data warehouses and unstructured “data lakes” more accessible than ever.

For Arun Murthy, the release manager for Hadoop 2.0, the most important change will be upgrading the MapReduce framework to Apache YARN, which will expand what software can be used in Hadoop and how much. Murthy, who is also YARN project lead and co-founder of Hortonworks, explained that “In Hadoop 1.0, everything was batch-oriented. In 2.0, you will now have multiple apps hitting the data inside all at once.” What YARN does, essentially, is divide the functionality of MapReduce even further, breaking the two major responsibilities of the MapReduce JobTracker component – resource management and job scheduling/monitoring – into separate daemons: a global ResourceManager and per-application ApplicationMaster.

Read the Full Story.


Also posted in Analytics, Hadoop, YARN | Leave a comment

Video: EIOW Exascale I/O Working Group

In this video from the Lustre User Group 2013, Meghan McClelland from Xyratex presents: EIOW – Exascale I/O Working Group.

There is a fierce competition on the storage market to offer the best performing devices, with great management at a low price. The EIOW group, from the outset, decided that it would not attempt to offer an end-to-end solution, which would necessarily involve competing instead of working with storage providers. The focus of EIOW is on middleware to provide, for example, schemas describing data structure and layout, novel access methods to data for applications, a uniform data management infrastructure and a framework for the implementation of layered I/O software, similar in spirit to HDF5 as a specialized use of a parallel file system. We decided EIOW should be open, and have interfaces to layer on lower level storage infrastructure such as object stores, databases and file systems as provided by storage providers, to allow their expertise and leadership in this area to continue to benefit the HPC community.

Download the EIOW whitepaper and slides, or check out our LUG 2013 Video Gallery.


Also posted in Hardware, HPC, I/O, Lustre, Storage, Video | Leave a comment

Beyond Batch – Moving Hadoop to Multi App with Apache YARN

Over at the Hortonworks Blog, Arun Murthy writes that Apache Hadoop NextGen MapReduce (YARN) provides the highly-sought-after ability to run SQL in Hadoop.

When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.

Read the Full Story.


Also posted in Analytics, Hadoop | Leave a comment

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap