Discovering Gold with Big Data Analytics and Data-Intensive Computing

Big Data and the Evolving Internet

 

Big Data is exposing glaring weaknesses in today’s Internet. And this, in turn, is driving the next stage of network evolution, much as the introduction of Ethernet and the Internet Protocol did back in the 1970s when IBM’s System Network Architecture was the only game in town.

This according to a story in Computerworld written by Stephen Lawson, based on a presentation by David Lambert, president and CEO of Internet2. Lambert spoke at the Open Networking Summit conference held this week in Santa Clara, CA.

Lambert pointed out that the current Internet began as a tool to help researchers from many different locations share data and insights.  It still fulfills that function, but in this new era of Big Data brought on by rapid advances in computing and storage, current Internet technology can’t support the massive amounts of data that engineers and scientists work with on a daily basis. Specifically, Lambert said that the technology used on the Internet today isn’t flexible enough to support new requirements, such as large file transfers, massive data sets, and content caching and distribution.

Software-defined networking (SDN) is one of the technologies underpinning the next steps in the evolution of the Internet. Lambert commented that SDN in universities today resembles the early Internet decades ago; practitioners in such data intensive fields as genomics need a new, flexible, open networking approach to advance their research.

Internet2 runs a nationwide network linking research institutions, and it’s already using elements of SDN on its production infrastructure,” reports Lawson. “SDN, a closely watched set of technologies at various stages of development, is intended to shift the control of networks from specialized devices such as switches and routers to software that can run on standard computing platforms and be virtualized. It promises a range of benefits that could include lower costs, faster service deployment and more network innovation.”

Internet2 is running a production pilot for SDN and a new high-speed backbone to provide users with the bandwidth needed to handle Big Data.  The pilot includes OpenFlow-enabled routers on a 100 Gigabit Ethernet network.  29 major universities have committed to deploying the 100 GbE network and using Internet2’s OpenFlow-based services.

The thing that excites me most about the development of OpenFlow and SDN … is the opportunity to have a network stack that’s open again, that people can actually get their hands on, and use it and do disruptive things,” Lambert said.

Read the Full Story.


Comment on this story

Advertisement

Univa Grid Engine Integration for Hadoop. Learn About Managing MapReduce Applications in a Shared Infrastructure with Univa Grid Engine. Read our Whitepaper.

Big Data Renaissance in North Carolina

 

In Chapel Hill, there are big doings in Big Data these days.

A new collaboration, known as the National Consortium for Data Science (NCDS) has been launched at RENCI, (Renaissance Computing Institute) at the University of North Carolina.

The consortium has ambitious goals according to a recent RENCI press release, which states that NCDS “…aims to make North Carolina a national hub for data-intensive business and data science research and education, a move that will help develop a national strategy to ensure U.S. leadership in the data-driven global economy.”

NCDS founding members include industry leaders, top research universities, and nonprofit and government organizations.

Representing the private sector are Cisco, GE, IBM, NetApp and SAS. UNC Chapel Hill, RENCI, North Carolina State University, UNC Charlotte, UNC General Administration, Duke University and Drexel University comprise the founding academic members.  Nonprofit and government sector members are The Hamner Institutes for Health Sciences, MCNC, the National Institute of Environmental Health Sciences, RTI International and the U.S. Environmental Protection Agency. All founding members have major facilities in North Carolina except Drexel, located in Philadelphia.

Those who harness the power of big data and use it to develop new data-intensive business sectors will be the winners in the 21st century economy,” said Stanley C. Ahalt, Ph.D., director of RENCI and a chief organizer of the NCDS, and professor of computer science at UNC-Chapel Hill. “Our members understand that, want to find solutions to big data problems and put North Carolina on the map as a center of data science innovation.”

The consortium is seeking to quickly consolidate its position as a major player in the evolving world of Big Data through a series of on-going initiatives.

For example, NCMS will be competing for federal research funding including the $200 million Big Data Research and Development Initiative announced by the White House last year, the National Science Foundation’s BigData program, and the National Institutes of Health’s Big Data to Knowledge initiative.

It will convene its first invitation-only NCDS Leadership Summit April 23-24 in Chapel Hill.  This year the summit will focus on genomics. Other initiatives include public lectures, an internship program for students at member companies, and visiting scientists’ positions at member universities for industry employees.

It is no longer enough for businesses to be big in order to be successful, now success is driven by the amount of knowledge a company possesses,” said David Turek, vice president, exascale computing, IBM. “Ninety percent of our planet’s data has been created within the past two years, and the demand will grow as businesses look to optimize big data analytics to improve decision making and expand their business operations into cloud, social and mobile environments.”

Read the Full Story.


Comment on this story

Podcast: Radio Free HPC Looks at FPGAs

 

In this podcast, the Radio Free HPC team discusses the recent buzz surrounding FGPAs. After being sidelined by accelerators, they’re increasingly being used in appliances.

Big vendors are talking about FGPAs not only for appliances but for general-purpose systems as performance assists. Are we headed back to the future? The guys discuss the ins and outs of FGPAs and why, in some cases, they could be a huge win for the organizations that implement them. But is the architecture flexible enough? For enterprise and Big Data, perhaps it is. If you need to perform the same algorithms over and over again, FGPAs could be a perfect fit. As with all things tech, there are a few cautionary notes to be sounded. Amassing more and more appliances can lead down a tricky road. Will their use in workload-optimized systems lead to vendor lock-in? Can you really teach an old FGPA new tricks? And can they be weaponized?

Most importantly: how are servers like cattle? Tune in to find out…

Download the MP3 * Subscribe on iTunes * RSS Feed


Comment on this story

You Can’t Keep a Good god Down

 

In Greek mythology the Titans, known as the elder gods, ruled the earth until the Olympians tossed them out of power.

At Oak Ridge National Laboratory, one of the Titans is not only still with us and doing well, but this modern day diety just got a boost that will help keep it king of the hill for some time to come.

The world’s fastest supercomputer, Titan is capable of delivering a peak capability of over 27 petaflops, ten times more powerful than previous generations of ORNL computers such as Jaguar. This week the Lab announced that it has selected DataDirect Networks to build the world’s fastest file storage system to ensure Titan’s ascendancy in the world of HPC.

Built around DDN’s SFA 12K-40, the system is being designed with 40 petabytes of raw capacity capable of  “ingesting, storing, processing and distributing research data at unprecedented speed,” according to a DDN press release.  The DDN system will work with the Lab’s Lustre parallel file system with a Lustre performance of over one terabyte per second to handle the demands of Titan’s 299,008 CPU cores.

The world’s toughest questions demand the toughest storage and the fastest technology to drive new levels of scientific insight. DDN has spent the better part of a decade engineering a platform that is built precisely and efficiently for today’s Big Data challenges,” comments Jean-Luc Chatelain, chief technology officer at DDN. “As applications everywhere – from energy exploration to climate modeling to energy efficient car manufacturing – continue to drive extreme levels of computational simulation and data analytics, we’re proud to provide the data storage technology that makes such innovation and economic competitiveness possible. We’re honored to continue our long-standing partnership with ORNL today and to be part of the future of Big Data and exascale computing tomorrow.”

ORNL points out that Titan is architecturally unique in a variety of ways, and is a showcase for tomorrow’s computational requirements as Big Data continues to make inroads into the enterprise.

When building the world’s fastest system for data intensive computing, we carefully considered all aspects of high-throughput I/O infrastructure and how efficient storage platforms can complement our supercomputer’s efficiency,” says Buddy Bland, project director for the Oak Ridge Leadership Computing Facility. “The ORNL and DDN teams have worked together to architect a file system designed to enhance the performance of our Titan supercomputer and enable our users to achieve unprecedented simulations and big data insights through massively scalable computing.”

The Olympians may have thought they dethroned the Titans, but as Yogi Berra once said, “It ain’t over till it’s over.”  Titan at ORNL, with some help from DDN, rules supreme…at least for now.

Read the Full Story.


Comment on this story

ScaleOut hServer Opens Up Hadoop Analysis for Live Data

 

Today ScaleOut announced their new hServer, the first in a series of products from ScaleOut Software that is filling in the need for real-time analytics with Hadoop.

While it’s a powerful platform for analyzing large, static data sets, Hadoop has always been limited by its inability to perform analytics on live data,” said Bill Bain, ScaleOut Software CEO. “There is an increasing drumbeat for real-time analytics using Hadoop, and we’re excited to take an important step towards meeting that need with this release.”

ScaleOut hServer will be available in both a free community edition and in several commercial editions. The community edition enables up to a four-server combined Hadoop/hServer grid for analyzing memory-based data sets of up to 256GB. Read the Full Story or check out our podcast interview with Bill Bain.


Comment on this story

Arkansas Dodges the Big Data Trough of Disillusionment

 

With Gartner claiming that Big Data is heading for its fabled “Trough of Disillusionment,” it’s nice to run across a story that shows the technology being warmly embraced with positive outcomes forecasted.

Such is the case with the Arkansas Department of Human Services (DHS), which will be using Big Data to modernize the delivery of social and healthcare services.

IBM announced last week that DHS will be implementing its IBM Smarter Cities solution, which provide Big Data analytics, social program management, and advanced security.  One of the major benefits of the roll out will be to transform an IT infrastructure that for some time ago has been mired in the Slough of Despond, to borrow a term from John Bunyan’s The Pilgrim’s Progress.

This will be DHS’ first step in transforming an IT infrastructure that is composed of more than 30 discrete system silos in an aging architecture,” said Dick Wyatt, chief information officer for Arkansas DHS. “Having a total view of our clients in one application — using the latest technology — will provide DHS with the ability to better manage the services provided. In addition, it will give DHS the ability to react more timely and efficiently to the many changes that are occurring and will continue to occur in the human services and healthcare arena.”

DHS will modernize its infrastructure with a service-oriented architecture that fully integrates its many different programs into, according to the IBM press release, “one re-usable and scalable platform.”  The new system will support such state social programs as Medicaid, the Supplemental Nutrition Assistance Program, and the State Children’s Health Insurance Program. The state’s adoption of Big Data analytics is expected to make it much easier for citizens to not only access government services, but achieve a satisfactory outcome as well.

According to the release, the heart of the new system is the IBM Cúram Social Program Management Platform. DHS is also deploying Cognos business intelligence software, Tivoli security solutions, DB2, Infosphere and Rational capabilities. All the software will run on IBM Power Systems.

So, the expectation is that the ability to crunch and analyze Big Data from DHS and other relevant departments will allow Arkansas to better deliver social programs to its citizens. IBM, of course, is on hand to help. Said Craig Hayman, general manager, Industry Solutions at IBM. “…the state can benefit from our deep healthcare industry expertise combined with an ability to apply that knowledge with Big Data analytics solutions that are secure and maximize existing technology investments.”

Read the Full Story.


Comment on this story

Data Scientists: The New Sex Symbol

 

Last October an article in the Harvard Business Review appeared with the headline, “Data Scientist: The Sexiest Job of the 21st Century.”

Last week Claire Cain Miller, a technology reporter for The New York Times, used the head to introduce her own story on the fast growing field of data science and the efforts on the part of the academic community to create a new crop of data scientists – whom she calls “…the magicians of the Big Data era.”

Magicians? Maybe. Sexy? We’ll leave that one for you to decide. Especially given the comment by Rachel Schutt, a researcher at Johnson Research Labs who taught a course on data science last semester at Columbia. Miller quotes Schutt as describing a data scientist as “a hybrid computer scientist software engineer statistician.”

Whether we’re dealing with sex symbols or supergeeks, the fact is that data scientists are in great demand. Miller reports that some recent graduates in the nascent field are pulling down six-figure salaries. And our educational system is taking note.

Universities can hardly turn out data scientists fast enough,” she writes. “To meet demand from employers, the United States will need to increase the number of graduates with skills handling large amounts of data by as much as 60 percent, according to a report by McKinsey Global Institute. There will be almost half a million jobs in five years, and a shortage of up to 190,000 qualified data scientists, plus a need for 1.5 million executives and support staff who have an understanding of data. North Carolina State University introduced a master’s in analytics in 2007. All 84 of last year’s graduates in the field had job offers, according to Michael Rappa, who conceived and directs the university’s Institute for Advanced Analytics. The average salary was $89,100, and more than $100,000 for those with prior work experience.”

Miller describes data science programs at a number of educational institutions across the country and goes on to say, “Because data science is so new, universities are scrambling to define it and develop curriculums. As an academic field, it cuts across disciplines, with courses in statistics, analytics, computer science and math, coupled with the specialty a student wants to analyze, from patterns in marine life to historical texts.”

Pretty sexy stuff, wouldn’t you agree?

Read the Full Story.


Comment on this story

Teradata Streamlines Hadoop Access for the Enterprise

 

Today Teradata announced that the new Enterprise Access for Hadoop and Unified Data Architecture enable business analysts to reach through Teradata directly into Hadoop to find new business value from the analysis of big, diverse data.

Today’s announcement of Teradata Enterprise Access for Hadoop is another example of our aggressive commitment to building out the Teradata Unified Data Architecture™,” said Scott Gnau, president, Teradata Labs. “Teradata Enterprise Access for Hadoop empowers organizations to dig deeply into files and data residing in Hadoop and combine the data with production business data for analyses – and action.”

Teradata Enterprise Access for Hadoop includes two new, innovative features that make access to data in Hadoop easy and secure for business analysts across the enterprise:

  • Teradata smart loader for Hadoop. For the first time, business analysts have point-and-click convenience to easily browse and move data between Teradata and Hadoop for analysis and self-service business intelligence.
  • Teradata SQL-H. The new Teradata SQL-H gives any user or application across the enterprise direct, on-the-fly access to data stored within Hadoop through standard ANSI SQL, leveraging the security, workload management, and performance of the Teradata data warehouse.

Read the Full Story or check out our RichReport Podcast interview with Scott Gnau.


Comment on this story

Google Targets Human Traffickers with Help from Big Data

 

Refugees from Myanmar wait for registration in Thailand after being rescued from a human-trafficking gang along the Malaysian border

Google has announced that it is taking steps to help combat human trafficking – a $32 billion illegal enterprise that exploits 20.9 million people worldwide, according to research from the United Nations Office on Drugs and Crime.

Google’s involvement includes a $3 million grant through its Global Impact Award program to three anti-trafficking organizations – Polaris Project, Liberty Asia and La Strada International.

Google will also leverage its Big Data technical expertise through its Google Ideas task force to build “…the first data-sharing platform to identify global patterns on how the human-trafficking trade operates and how to better protect the victims,” according to a story written by Bernhard Warner in Bloomberg Businessweek.

The story quotes Jared Cohen, director of Google Ideas, who said, “Nine months ago, starting with the Google Ideas Summit, we set out to map, expose, and disrupt the workings of illicit networks. This includes organized crime, narco-trafficking, organ harvesting. Every single one of these networks involved human trafficking.”

The Google Ideas task force will team with Palantir Technologies and Salesforce.com to build a data-sharing platform to identify and analyze how the trafficking trade operates and how to better protect its victims.

The alliance announced on Tuesday means the three anti-trafficking networks, which operate emergency hotlines in North America, Europe, and Southeast Asia, will share data on where the emergency phone calls are originating, the ages of the victims, their home countries, and the types of criminal activities they have been forced into,” writes Warner. “With the help of Salesforce.com, Palantir, and Google, the agencies will be able to crunch data like this in real time to detect crime trends that they can then share with police and policymakers to help protect victims.”

Read the Full Story.


Comment on this story

Video: The CIA’s Grand Challenges with Big Data

 

In this video from the Structure: Data 2013 conference, Central Intelligence Agency CTO Ira “Gus” Hunt presents: The CIA’s Grand Challenges with Big Data.

Sensors, agents and an Internet of Things are all producing data, all of the time. It would be a vast understatement to say that the CIA has experience in acquiring, handling and analyzing big quantities of data. In this talk, the CTO of the CIA will talk about the scale of the problems his team deals with now, the coming inflection point in the increase in data, the grand challenges we face and why an emphasis on analytics is critical for the future. This is a talk not to be missed.


Comment on this story

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap