Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “I/O”

Big Data and the Evolving Internet

Big Data is exposing glaring weaknesses in today’s Internet. And this, in turn, is driving the next stage of network evolution, much as the introduction of Ethernet and the Internet Protocol did back in the 1970s when IBM’s System Network Architecture was the only game in town.

This according to a story in Computerworld written by Stephen Lawson, based on a presentation by David Lambert, president and CEO of Internet2. Lambert spoke at the Open Networking Summit conference held this week in Santa Clara, CA.

Lambert pointed out that the current Internet began as a tool to help researchers from many different locations share data and insights.  It still fulfills that function, but in this new era of Big Data brought on by rapid advances in computing and storage, current Internet technology can’t support the massive amounts of data that engineers and scientists work with on a daily basis. Specifically, Lambert said that the technology used on the Internet today isn’t flexible enough to support new requirements, such as large file transfers, massive data sets, and content caching and distribution.

Software-defined networking (SDN) is one of the technologies underpinning the next steps in the evolution of the Internet. Lambert commented that SDN in universities today resembles the early Internet decades ago; practitioners in such data intensive fields as genomics need a new, flexible, open networking approach to advance their research.

Internet2 runs a nationwide network linking research institutions, and it’s already using elements of SDN on its production infrastructure,” reports Lawson. “SDN, a closely watched set of technologies at various stages of development, is intended to shift the control of networks from specialized devices such as switches and routers to software that can run on standard computing platforms and be virtualized. It promises a range of benefits that could include lower costs, faster service deployment and more network innovation.”

Internet2 is running a production pilot for SDN and a new high-speed backbone to provide users with the bandwidth needed to handle Big Data.  The pilot includes OpenFlow-enabled routers on a 100 Gigabit Ethernet network.  29 major universities have committed to deploying the 100 GbE network and using Internet2’s OpenFlow-based services.

The thing that excites me most about the development of OpenFlow and SDN … is the opportunity to have a network stack that’s open again, that people can actually get their hands on, and use it and do disruptive things,” Lambert said.

Read the Full Story.


Also posted in Business of Big Data, Software, Video | Leave a comment

Free Ebook: NAS Optimization for Dummies

Our friends at Avere are offering a free copy of NAS Optimization for Dummies.

Big NAS performance comes from your ability to scale, eliminate sources of latency, and gain the advantages of the cloud. Get started with Avere Systems’ Special Edition of NAS Optimization for Dummies by Allen G. Taylor.

In this book, you’ll find:

  • How to configure NAS storage for optimal performance
  • Ways to reduce the cost of upgrades as your storage needs grow
  • How to minimize the impact of multiple users hitting the storage systems at the same time

Read the Full Story.


Also posted in Book Review, Sponsored Post, Storage | Leave a comment

DSSD is Andy Bechtolsheim’s Secret Chip Startup for Big Data

Over at GigaOm, GigaStacey writes that the solution for better and faster storage may lie in DSSD, a stealthy chip startup backed by Andy Bechtolsheim. Founded in 2010 by Sun Alums Jeff Bonwick and Bill Moore, DSSD is trying to build a chip that would improve the performance and reliability of flash memory for high performance computing, newer data analytics, and networking.

My sources tell me the startup is building a new type of chip — they said it’s really a module, not a chip — that combines a small amount of processing power with a lot of densely-packed memory. The module runs a pared-down version of Linux designed for storing information on flash memory, and is aimed at big data and other workloads where reading and writing information to disk bogs down the application. This fits with the expertise of the team, but this is a problem that others are trying to solve as well with faster and cheaper SSDs and targeted software to to optimize the flow of bits to a database. But the proposal here appears to be about designing an operating system that takes advantage of the difference in Flash memory when compared to hard drives to boost I/O.

Read the Full Story.


Also posted in Business of Big Data, Flash and SSD, Hardware, HPC, Software, Startups, Storage, ZFS | Leave a comment

Big Data Freeway Under Construction in San Diego

If you’ve ever driven the freeways of Southern California, you might wonder about the metaphor chosen to describe the new high speed, Big Data network announced this week by the University of California, San Diego.

Known as the Prism@UCSD project, the university is building a high performance cyberinfrastructure to support bursts of Big Data between campus facilities housing diverse disciplines – such as science, engineering, medicine and the arts – without killing the main campus network.

With $500,000 in funding from the National Science Foundation (NSF), the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2) is developing Prism specifically to support researchers in such data-intensive scientific areas as genomic sequencing, climate science, electron microscopy, oceanography and physics.

We’ve identified a variety of big data users on this campus who need ten gigabit/s and faster bandwidth to deal with the avalanche of data coming from scientific instruments such as sequencers, microscopes and computing clusters,” said Philip Papadopoulos, principal investigator on the Prism@UCSD project, who splits his time between Calit2 and the university’s San Diego Supercomputer Center (SDSC). “We’re starting at 1 Terabit/s of connected capacity through our next-generation modular switch, which is at the center of the Prism network. It can carry 20 times the traffic of our current research network, and it’s 100 times the bandwidth of the main campus network.”

Adds Papadopoulos, “You can think of Prism as the HOV lane, whereas our very capable campus network represents the slower lanes on the freeway.” Let’s hope he’s talking about the freeway at three in the morning.

Prism@UCSD is a response to the growing challenge of Big Data,” said Calit2 Director Larry Smarr. “The key innovation in Prism@UCSD is to provide end-to-end dedicated large bandwidth to the end-users on campus.”

And he too invokes the freeway metaphor: “The Prism Big Data network also creates a high-capacity ‘data freeway’ to campus, national or international networks,” adds Smarr.

A roadway that has an aggregate bandwidth equivalent to over one terabit per second could go a long way to clearing up Southern California’s traffic problems.

Read the Full Story.


Also posted in Education, Hardware, Network, Research | Leave a comment

Video: Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached

In this video from the HPC Advisory Council Switzerland Conference, D.K. Panda from Ohio State University presents: Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached. Download the slides (PDF).


Also posted in Events, Hadoop, Video | 1 Comment

Video: Architecting High Availability Lustre Storage with ClusterStor 6000

In this video, John Fragalla from Xyratex presents: Architecting High Availability Lustre Storage with ClusterStor 6000.

ClusterStor 6000 is designed to support installations with linear performance scalability in less space, scaling from up to 6 gigabytes per second to installations providing 1 terabyte per second file system throughput, as well as linear data storage capacity from terabytes up to tens of petabytes.

The presentation was recorded at the HPC Advisory Council Stanford Conference 2013. Download the slides (PDF).


Also posted in Hardware, HPC, Lustre, Software, Storage, Video | Leave a comment

Henry Newman on File System Interface Futures

Over at Enterprise Storage Forum, Henry Newman looks at the future of file systems and examines whether REST will overtake POSIX as an interface of choice for all applications.

We do not have a lot of POSIX file systems that scale today to 10s of PB and billions of files. There are three file systems in production with a parallel namespace (Gluster, PAN-FS, Lustre, and GPFS) and a new entry called Ceph. Ceph, GPFS Lustre and Pan-FS support parallel I/O, which is I/O from multiple threads (these threads could be running on multiple nodes) to a single file, but Gluster does not. On the other side there are dozens of vendors developing REST- and SOAP-based object management interfaces. Vendors are trying to create systems that support billions of objects in a single namespace. Given that the vendors are not constrained by the POSIX atomicity requirements and support for parallel I/O, this is far easier than developing this support inside a POSIX file system.

Read the Full Story.


Also posted in Ceph, Lustre, Software, Storage | Leave a comment

Georgia Tech wins DARPA ADATA Research Funding

A team at the Georgia Institute of Technology has received a $2.7 million award from the Defense Advanced Research Projects Agency (DARPA) to develop technology to help address the challenges of Big Data – data sets that are both massive and complex.

The contract is part of DARPA’s XDATA program, a four-year research effort to develop computational techniques and open-source software tools for processing and analysing data, motivated by defence needs. Georgia Tech was selected to perform research in the area of scalable analytics and data-processing technology.

The team will focus on producing new machine-learning approaches capable of analyzing very large-scale data. Team members will also pursue development of distributed computing methods that can process data-analytics algorithms very rapidly with a variety of systems, including supercomputers, parallel-processing environments and networked, distributed computing systems.

‘This award allows us to build on the foundations we’ve already established in large-scale data analytics and visualisation,’ said Richard Fujimoto, leader of the Georgia Tech team. ‘The algorithms, tools and other technologies that we develop will all be open source, to allow them to be customised to address new problems arising in defence and other applications.’

The award is part of a $200 million multi-agency federal initiative for big-data research and development. It aims to improve the ability to extract knowledge and insights from the nation’s fast-growing volumes of digital data.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.


Also posted in Analytics, HPC, Public sector | Leave a comment

Mellanox Accelerates Teradata Unified Big Analytics Appliance


Clipped from http://www.teradata.com/Aster-Big-Analytics-Appliance/

Today Mellanox announced that Teradata has chosen its InfiniBand interconnect solution to accelerate the Teradata Aster Big Analytics Appliance. Designed for demanding analytics which require high computational power and the fastest data movement, the Teradata Aster Big Analytics Appliance offers up to 19 times better data throughput and performs analytics up to 35 times faster than typical off-the-shelf commodity bundles.

Teradata Aster Big Analytics Appliance is part of a truly unified, high-performance big data analytics architecture for the enterprise and will help customers achieve business value,” said Carson Schmidt, vice president of platform engineering at Teradata. “The Mellanox InfiniBand interconnect is the best choice to enable the performance that is needed to analyze big data at a speed that no other analytic platform can deliver. This performance maximizes the return on investment and accelerates time to value.”

Read the Full Story.


Also posted in Business of Big Data, Hardware, Network | Leave a comment

Steve Simms on the Data Capacitor II at Indiana University

In this video from SC12, Steve Simms from Indiana University describes a recent upgrade to the Data Capacitor project, a high-speed, high-capacity storage facility for very large data sets. With 5 PB of storage, Data Capacitor II will support big data applications used in computational research. IU partnered with DataDirect Networks to develop Data Capacitor II, which is scheduled to be installed in the IU Data Center in spring 2013.


Also posted in Flash and SSD, HPC, Life Sciences, Research, Storage, Video | Leave a comment

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap