Discovering Gold with Big Data Analytics and Data-Intensive Computing

Advertisement

Entries filed under “Graph Computing”

Radio Free HPC Fireside Chat – HPC Embraces Big Data

In this slidecast, the Radio Free HPC team interviews Fritz Ferstl, CTO of Univa. Topics include Big Data, HPC, and the continuing convergence of both.

While what we think of as traditional HPC may differ greatly from Big Data analytics, that seems to be changing. With a long history in high performance computing and customers in both worlds, Ferstl shares his unique perspective on where the two worlds overlap and where the potential is greatest for synergy in the future.

This has to be our best show yet, so be sure to check it out.

View the slides on Slideshare * Download the MP3 * Download the mobile video * Download 1024p Video * Subscribe on iTunes * RSS Feed


Also posted in Business of Big Data, Hadoop, HPC, MapReduce, Podcasts, Video | Leave a comment

Green Graph 500 Launches to Boost Energy Efficient Big Data Computing

In this special guest feature, Torsten Hoefler from ETH Zurich writes that the new Green Graph500 aims to boost energy-efficient Big Data Computing.

“Big Data” can be analyzed in various ways. The most successful and prevalent programming model, MapReduce, convinces by its flexibility toadapt to hardware performance variations and faults. However, even though MapReduce covers a huge majority of use-cases, it has its limits for graph computations. Complex graph algorithms become more important as our analysis capabilities grow. For example, problems such as finding hubs in social network graphs are routinely answered today. The underlying algorithm, betweenness centrality, utilizes a graph traversal similar to breadth first search or shortest path search. Systems such as Google’s Pregal, Apache’s Giraph, the (Parallel) Boost Graph Library, and Stanford’s GPS are just some examples for emerging frameworks to handle large-scale graph computations. In order to efficiently compare architectures and possibly programming frameworks, the Graph 500 benchmark strives to establish a database for performance of a standardized breadth first search on various platforms.

As energy is becoming a bigger concern than hardware purchasing costs in large-scale data centers and supercomputing centers, it becomes mandatory to not only consider the performance of such computations but also their exact energy consumption. In fact, if the current cost trends continue, then energy consumption will soon be more important than absolute performance. Such discussions are highly relevant for operators of large data centers such as Google, Amazon, and Yahoo, as well as large supercomputing centers operated by the DOE (e.g., LLNL, Sandia,LANL, ORNL) and the NSF (e.g., NCSA, SDSC, PSC). We are thus looking forward to interesting future developments targeting exascale as well as Big Data architectures and programming frameworks.

We introduce the Green Graph 500 list which fulfills a variety of purposes. First and foremost it is to establish the practice to compete not only for the highest performance but also for the highest energy efficiency, directly benefiting society. It is also set out to collect historical data about developments that may allow us to predict future trends very similar to what the top 500 list has achieved in the past(who doesn’t like to put up a top 500 slide to project out FLOP rate for the next 10 years?). The list will also allow us to compare the energy efficiency of a specific computer for certain tasks, e.g.,dense linear algebra (a problem mainly limited by memory size and CPU peak floating point performance) versus graph search (a problem mainly limited by memory access rates and global system bandwidth). Those two metrics together may serve as a measure to generate more efficient balanced systems as well as special-purpose systems for one of those tasks.

Finally, the new Green Graph 500 list is not meant to compete with any of the existing lists. It is indeed complementary, filling an important gap in the field. In fact, the rules are designed to be similar to the established Green 500 rules (similar, not identical, for example with regards to the network) so that comparisons can easily be made in the future. It also directly integrates with the Graph 500 list and submission system to guarantee one-to-one comparisons (a submission record may be in the Green Graph 500 as well as the Graph 500 even though the lists are ranked by different indices).

The Green Graph 500 list is soliciting submissions from everyone through the Graph 500 submission system. To submit to the list, simply start a normal Graph 500submission and select “Submit to Green Graph 500″ or “Submit to both lists”. The only additional data you need for a Green Graph 500submission is the actual power draw of your system during the benchmark.

Another small difference between Graph 500 and it’s Green peer is the measurement methodology. Since most power meters are not accurate enough to measure the rather short actual BFS run (not including the post-check etc.), we offer a slightly modified version of the reference benchmark which allows to run the BFS in a tight loop long enough for a low-time resolution energy meter to measure the exact energy consumption. This benchmark will also report a Graph 500 number valid for submission. For runs with a custom implementation, this would need to be ensured manually (4-5 lines of C Code suffice for this). The submission opens together with the official Graph 500 submission.

As a sneak peek, we prepared a sample list from March 2013′s energy submissions (which may not have followed all the official rules, thus, the list is not official).

The Green Graph 500 list is maintained by Torsten Hoefler from ETH Zurich in collaboration with the Graph 500 executive committee. For questions or comments please contact [email protected]


Leave a comment

Applications Due: Global Big Data Conference 2013 Startups Pitch

The Global Big Data Conference is hosting a Startups Pitch on January 28, 2013 in Santa Clara. This event is a great opportunity for startup firms to showcase their products to the world in front of prominent technology experts and potential investors.

Are you working on the next big thing that will take the world by storm ? The event allows you to showcase your products in front of potential investors, users and partners. The judging panel, which consists of investors, entrepreneurs, and industry analysts, will nominate seven startups from the submissions. The shortlisted startups will have a chance to present their products and technologies at the Startups Pitch on Monday, January 28. The judging panel will choose a winner based on the presentations made.

Hurry! Pitch applications are due January 20, 2013.


Also posted in Business of Big Data, Events, Startups | Leave a comment

YarcData Announces Finalists for $100K Graph Analytics Challenge

This week YarcData announced the six finalists for the company’s Graph Analytics Challenge. The top six entries for the contest, which features $100,000 in prizes including a $70,000 grand prize for the first place winner, were determined to have entered the best submissions for un-partitionable, Big Data graph problems.

The quality and diversity of the many entries we received as part of the Graph Analytics Challenge is very exciting to us,” said Arvind Parthasarathi, president of YarcData. “This contest is about increasing awareness and interest in graph analytics, and it is encouraging to see that the Big Data community is embracing new ways to apply graph analytics towards solving complex problems with incredibly large data sets.”

The contest is intended to promote the use and development of RDF and SPARQL (both standards developed by the World Wide Web Consortium) as the industry standard for graph analytics. Winners will be announced in January. Read the Full Story.


Also posted in Business of Big Data | Leave a comment

YarcData Upgrades Big Data Appliance, Goes with Subscription-based Pricing

Today Cray’s YarcData division announced a major upgrade to its uRiKA Big Data appliance for graph analytics. With the uRiKA Fall 2012 Release, the company is adding substantial new standards-based capabilities to boost functionality and performance on complex graph analytics inquiries.

Big Data graph-analytics are increasingly used to reveal unknown, unexpected or hidden relationships in a wide range of markets, including financial services, health sciences, energy, transportation, Internet commerce and others,” said Arvind Parthasarathi, president of YarcData. “Our uRiKA appliance combined with the enhanced capabilities in this release gives our current and future customers the world’s most powerful, easy-to-use platform for exploiting the power of graph analytics.”

What I find interesting here is that the uRiKA appliance is available via subscription-based pricing. The large memory of the uRiKA appliances may make the sticker price a bit too spendy for this market, so YarcData is letting you pay-as-you-go. Will this be a business model for future Cray supercomputers? We’ll have to wait and see. Read the Full Story.


Also posted in Analytics, Business of Big Data, Hardware | Leave a comment

YarcData Shows Why Big Data is More than Hadoop

Gartner’s Carl Claunch writes that Big Data involves a much wider and varied set of needs, practices, and technologies than just Hadoop or data warehouses. To meet the needs of graph computing, Cray’s YarcData systems feature a single systemwide memory space that does not need to be partitioned.

YarcData has designed uRiKA with three technologies to minimize or eliminate the costs of the irregular, unpredictable leaps in graph processing. A unique approach to processor design, YarcData’s Threadstorm chip, shows no slowdown under the characteristic zigs and zags of graph-oriented processing. Second, the data is held in-memory in very large system memory configurations, slashing the rate of file accesses. Finally, a global shared memory architecture provides every server in the uRiKA system access to all data.

Read the Full Story.


Also posted in Hadoop | Leave a comment

Slidecast: Making the Biggest Big Data Easy

In this slidecast, 1010 Data CEO Sandy Steier presents: Making the Biggest Big Data Easy. The company has just re-launched its web site with a new look and a live demo.

For well over a decade, 1010data has pushed the limits of analytics on large amounts of data, including “Big Data”. From routine reporting to advanced analytics, the 1010data system allows businesses like yours to hone their tactics and strategy, while reducing technology overhead, costs and risk.”

Download the MP3 * Subscribe on iTunes * If Dropbox is blocked, download from this Google page.


Also posted in Podcasts, Software, Video | 1 Comment

YarcData Rolls Out $100K Big Data Graph Analytics Challenge

Today Cray’s YarcData subsidiary announced the YarcData Graph Analytics Challenge. With $100,000 in prizes, the contest will recognize the best submissions for solutions of un-partitionable, Big Data graph problems.

Graph databases have a significant role to play in analytic environments, and they can solve problems like relationship discovery that other traditional technologies do not handle easily,” said Philip Howard, Research Director, Bloor Research. “YarcData driving thought leadership in this area will be positive for the overall graph database market, and this contest could help expand the use of RDF and SPARQL as valuable tools for solving Big Data problems.”

 
The YarcData Graph Analytics Challenge will officially begin on Tuesday, June 26, 2012 and winners will be announced during a live web event on Dec. 4, 2012. Read the Full Story.


Also posted in Analytics | Leave a comment

YarcData Appliance Cranks through High Performance Analytics

Phillip Howard writes that uRiKA from Cray is to graph databases what Netezza was to data warehousing when it first appeared on the market: appliance-based, scalable (uRiKA more so) and focused solely on high-performance analytics.

10 million of these patient records are being compared with one another including anonymised historical data spanning all events, symptoms, diseases, treatments, prescriptions, genetics and family history: in order to identify “similar patients” where similarity, and degree thereof, and under what conditions, forms the basis of the graph stored within uRiKA. This graph is then used to support doctors and to provide information in real-time about what treatments were most successful on similar patients. As the doctors are consulting with patients live, they can consult the uRiKA system on iPads or other devices to get guidance and tweak their search parameters on the fly during the patient visit.

Read the Full Story.


Also posted in Analytics, Hardware | 1 Comment

New “Research HPC” Podcast from Intel Labs Looks at Graph Computing

Mike Bernhardt from The Exascale Report has launched a new podcast series called “Research HPC.” The podcast will feature key voices from the Intel Labs think tank.

In his first program, Mike discusses the Graph 500 with Pradeep Dubey, a senior principal engineer and Director of the Parallel Computing Lab (PCL) within Intel Labs.

Download the MP3 * If Dropbox is blocked, you can download from this Google page.


Also posted in HPC, Podcasts | 1 Comment

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap