Discovering Gold with Big Data Analytics and Data-Intensive Computing

Advertisement

Entries filed under “Analytics”

GPUs Power Big Data for Frock Finding

In this special guest feature, Dan Olds from Gabriel Consulting writes that a demo at this week’s GPU technology conference showed how Big Data powered by accelerated computing could change the face of retail.

NVIDIA CEO Jen-Hsun Huang’s GTC 2013 keynote was a typical whirlwind tour (with real wind, but that’s a different article) through all the various GPU-related worlds that NVIDIA is touching these days. These addresses are usually chock-full of demonstrations showing where we are in terms of state-of-the-art graphics, scientific and technical computing, entertainment, and now: finding dresses.

In this demonstration, Jen Hsun leafed through the latest edition of In Style magazine. While the models are svelte (or starved), the magazine definitely isn’t, weighing in with 594 pages of ads. A dress from one of those ads was chosen, its picture was taken, and it was sent off for image matching. What came back was a set of likely matches that the image-matching tool found via eBay. (This can be seen in the semi-blurry picture taken from my third-row perch.)

Hmm… now that I think about it, this technology probably isn’t confined only to dresses. With some minor technical tweaks (like checking different boxes), I imagine it would be quite possible to match many other items. I’m thinking handbags, blouses, shoes, skorts, and even jorts for those needing to feed their denim demons.

They also demonstrated that it’s possible to capture a particular pattern and then search for clothing that has the same, or a similar, look. To my untrained eye, it looked to do a pretty good job. It didn’t find exact matches, but the selection shown came pretty close to the mark.

The impressive thing about this tool is its accuracy and speed. On each demo it not only returned the correct type of garment, but the results were surprisingly close to the original image in terms of look and general configuration. And it took only a few seconds – not much longer than the loading time for a web page.

There are already a fair number of images on the Internet, and users of Facebook add something like 300 million more per day. On the video side, there’s something like 72 hours of video added to YouTube per minute. Over time, this is going to add up. There will be an acute need for more sophisticated image searching/matching technology.

So – aside from everyone who likes to shop for clothes, who will use this technology? The companies who want to make it quicker and easier for potential customers to comb through their vast inventories of goods. With our increasing reliance on communicating via images, the ability to search, sort, and match is going to become more important over time.


Also posted in Business of Big Data, Events | Leave a comment

Taking Steps to Make Better Decisions with Big Data

If only it were that easy. A recent article in Forbes presents “4 Steps to Turning Big Data into Business Impact.” Written in staccato fashion by Plyanka Jain, a consultant specializing in analytics, the piece falls into the general “How to” category characterized by articles such as “Five Steps to Flatter Abs” or that wikiHow classic, “How to Write a How to Article: 10 Steps.”

Jain is addressing executives who have been tasked with meeting ambitious growth targets mandated by their board of directors. Big Data holds the key to this growth. But, she asks the reader, despite an abundance of data in your organization, are you still at a loss as to how to use that data to understand the business drivers?

Enter Step 1 where she asks, “Introspection on your self and your leadership Team: Are you making evidence based decisions or are you gut-happy decision-maker?” (sic).

Your organization will not progress towards being data-driven, unless, you and your leadership team are asking the Three key questions of your data and your team,” Jain adds. They are: “(1) ‘How do we define our success?’ (2) ‘What drives our success?’ and (3) ‘Who are our customers, and how do we engage them?’ Whether you use zero-sum budgeting or other ways to hold your leadership team accountable to the decisions they make, there needs to be some accountability structure, because you can only manage what you measure. And as soon as you start looking back at decisions which were made, you start finding ways to optimize those decisions. And inarguably, there is no better way to optimize decisions, than basing it on data and facts.”

Good advice, but Step 1 also harbors the land mine that can destroy the whole four step process in an instant. As data scientist Thomas Thurston pointed out in a recent inside-BigData post, “… a lot of business decisions have to be made quickly. There isn’t time to build a predictive model or to even glance around for patterns…Relying on your wits is part of doing business. However if there are big problems that keep resurfacing, it’s a lot slower to go on guessing. If you don’t bring data science or some other form of rigor to the table, you may never get a grip on what the underlying problem is.”

So the underlying problem may really be the fact that as an executive you are more comfortable with a “gut-happy decision-maker” style of management, making snap judgements based on intuition and years of experience rather that a slow perusal of analytical data.

If you do happen to make it to Jain’s Step 2, you’ll find she recommends an investment in employees with well honed problem solving, analytical and managerial skills. Creating a robust data infrastructure is the call to action in Step 3; and Step 4 urges the reader to set up a transparent, formal decision making process.

If Jain’s how to article doesn’t solve your Big Data problems, she invites you to download a white paper, attend a half day data round table or, for a really immersive experience, attend her company’s Business and Testing workshop week in April.

Yes, the Forbes piece is blatantly a bit of marketing collateral for Aryng, Jain’s analytics training and consulting company. But if it helps move you from being a gut-happy decision maker to a manager who, without loosing the benefits of intuitive thinking, ¬ knows when to make decisions using all the tools that analytics and data science provide, it’s well worth the read.

Read the Full Story.


Also posted in Business of Big Data | Leave a comment

Python for CUDA to Bolster Next Wave of GPU-powered HPC and Big Data Analytics

Today Nvidia announced that growing ranks of Python users can now take full advantage of GPU acceleration for HPC and Big Data analytics applications by using the CUDA parallel programming model. As a popular, easy-to-use language, Python enables users to write high-level software code that captures their algorithmic ideas without delving deep into programming details. Python’s extensive libraries and advanced features make it ideal for a broad range of HPC science, engineering and big data analytics applications.

Our research group typically prototypes and iterates new ideas and algorithms in Python and then rewrites the algorithm in C or C++ once the algorithm is proven effective,” said Vijay Pande, professor of Chemistry and of Structural Biology and Computer Science at Stanford University. “CUDA support in Python enables us to write performance code while maintaining the productivity offered by Python.”

Support for CUDA parallel programming comes from NumbaPro, a Python compiler in the new Anaconda Accelerate product from Continuum Analytics. This support was made possible by Nvidia’s contribution of the CUDA compiler source code into the core and parallel thread execution backend of LLVM, a widely used open source compiler infrastructure. Read the Full Story.


Also posted in Pyhon, Software | 1 Comment

Big Data Maturity Model – Making Big Data Work for You

For business executives that want to make the most of Big Data but find the whole process somewhat daunting, the research house International Data Corporation (IDC) has published a Big Data and analytics (BDA) maturity model designed to bring order out of chaos.

The BDA model provides an extensive framework of stages, critical measures, outcomes, and actions. The idea is that organizations can use this framework to move through the successive stages of BDA competency and maturity in an orderly fashion.

For example, they can:

  • Assess their BDA competency and maturity
  • Define short- and long-term goals and plan for improvements
  • Prioritize BDA technology, staffing, etc.
  • Uncover maturity gaps among business units and between business and IT groups

In a related story in Computerworld (IDC and Computerworld are both part of the International Data Group), IDC analyst Dan Vesset comments, “We’re at this period now where there’s certainly a lot of hype, a lot of promise. The question is what’s reality, and what can and should companies do in the near- and short-term?”

Big Data and analytics have become top agenda items for many executives, but with the opportunity to unlock the value of Big Data to accelerate innovation, drive optimization, and improve compliance comes the need to demonstrate value, navigate expanding technology alternatives, re-create business processes, and ensure the availability of appropriately skilled staff,” said Vesset in a press release announcing the BDA model. “The ability to manage and analyze Big Data and derive value from these activities will increasingly define an organization’s ability to compete or service its constituents.

The BDA model is described in the IDC report: IDC Maturity Model: Big Data and Analytics – A Guide to Unlocking Information Assets.

Read the Full Story.


Leave a comment

New MIT Software Targets Data-Intensive Cloud Computing

When data-intensive applications meet the cloud, there may be stormy weather ahead.

Cloud computing services undeniably generate a long list of benefits: for example, economies of scale, responsiveness to fluctuating job requirements, in-depth technical support, and the pay-as-you-go scenario come to mind. But researchers at MIT are also aware that applications built around large-scale database queries can cause havoc in the cloud.

Cloud services often partition their servers into virtual machines. Each of these machines is constrained in a number of ways: for example, they may be assigned a finite number of operations per second on the server’s CPU, or allocated a limited amount of space in memory. According to MIT, that makes for easier management of the cloud servers, but it also can result in an allocation of about 20 times more hardware than is necessary to do the job. Naturally the cost of this overprovisioning gets passed on to the customer.

This has prompted MIT researchers to begin work on a new system called DBSeer. According to a recent press release, the software uses machine-learning techniques to build accurate models of performance and resource demands of database-driven applications.

The new algorithm at the heart of DBSeer has been released under an open-source license. Teradata, one of the leaders in the Big Data revolution, is already in the process of importing the algorithm into its solutions.

“With virtual machines, server resources must be allocated according to an application’s peak demand,” explains Barzan Mozafari, one of the MIT researchers. “You’re not going to hit your peak load all the time. So that means that these resources are going to be underutilized most of the time. Provisioning for peak demand is largely guesswork. It’s very counterintuitive, but you might take on certain types of extra load that might help your overall performance. Increased demand means that a database server will store more of its frequently used data in its high-speed memory, which can help it process requests more quickly.

However, a slight increase in demand could cause the system to slow down precipitously – if, for instance, too many requests require modification of the same pieces of data, which need to be updated on multiple servers. “It’s extremely nonlinear,” Mozafari says.

The MIT team has built a DBSeer model of MySQL and they are currently working on a new model for PostgreSQL – both widely used database systems.

Read the Full Story.


Also posted in Cloud, Research | Leave a comment

Thomas Thurston on Data Science vs Intuition: Which is Better?

Over at the Growth Science Blog, Thomas Thurston describes how the question of whether data science is better than human intuition is really false choice.

Relying on your wits is part of doing business. However if there are big problems that keep resurfacing, it’s a lot slower to go on guessing. If you don’t bring data science or some other form of rigor to the table you may never get a grip on what the underlying problem is. In the long term it’s a lot slower to treat the symptoms (keep guessing) than to cure the heart of the disease (ex. use data science). In contrast, data science is often slower at first, but faster later on. It can take hours, months or even generations for data science to model, predict or solve some of business’s toughest riddles. However once you’ve built a robust tool, it’s relatively fast and easy to handle or even prevent those problems if they threaten to pop up again.

Read the Full Story.


1 Comment

Think of This: Most of the World’s Data is Unanalyzed

The idea that we use only 10 percent or less of our brain is one of those persistent myths that stubbornly refuses to go away. In reality, that bit of gelatinous grey matter between our ears has been extensively mapped and it appears that most of it has a function. So the notion that if only we could harness that underused 90 percent of our brain we could develop superhuman mental abilities, needs to be relegated to the urban legend scrap heap.

But what about the idea that the world of Big Data is in even worse shape – that only one percent of what’s actually out there is being analyzed?

That’s the contention of Gurjeet Singh, a guest contributor to a recent issue of GigaOM.

Singh is the co-founder and CEO of Ayasdi, a discovery platform built on topological data analysis technology. He says that to make use of the untouched 99 percent of available data – some one quintillion bytes that are collected every day (or, according to IBM, a daily harvest of 2.5 quintillion bytes) – we have to “fundamentally change the way we gain knowledge from data.”

Singh calls on the next wave of Big Data solutions to: empower domain experts such as biologists, geologists, etc. – not just the data scientists; accelerate the pace of discovery so we can get insights faster; create new levels of machine intelligence – human thought is too slow for many Big Data operations; and design systems that are able to efficiently analyze both structured and unstructured data, especially the latter in all its myriad forms.

When it comes to the evolution of big data, we’ve only begun to scratch the surface,” Singh writes. “It stands to reason that if we continue to analyze 1 percent of data, then we’ll only tap into 1 percent of its potential. If we’re able to analyze the other 99 percent, then think about all of the ways that we can change the world. We can accelerate economic growth, cure cancer and other diseases, reduce the risk of terrorist attacks, and many other big ticket challenges that we’re faced with. That’s something that we can all rally around.”

Read the Full Story.


Also posted in Business of Big Data, Storage | Leave a comment

Video: Game Changing Use Cases for Big Data

In this video from the Smarter Analytics Leadership Summit, Inho Cho Suh and Jason Verlen from IBM describe five game changing use cases for Big Data.

These game changing use cases will help you translate the promise of Big Data into tangible outcomes for your organization. This session will help you get started in deploying real solutions that drive big impact with Big Data.


Also posted in Business of Big Data, Events, Video | Leave a comment

Podcast: The Big Data Revolution

In this podcast from the Leonard Lopate Show, Author Viktor Mayer-Schönberger explores how Big Data will affect the economy, science, and society at large.

Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. Big Data: A Revolution that Will Transform How We Live, Work, and Think shows how this emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to reach epiphanies that we never could have seen before.

Download the MP3.


Also posted in Podcasts | Leave a comment

Big Data and the Efficacy of that Treasured “Gut Feeling”

If you Google “benefits, Big Data” you’re rewarded with a landslide of links pointing to various companies, consultants and vendors extolling the virtues of this new technology trend.

One of the more restrained assessments comes from McKinsey & Company research, which indicates that we are now at an inflection point where the ability to capture, store and analyze the massive amounts of data available to business, government and consumers has become an economic necessity. Referencing the McKinsey research, an article in the Ivey Business Journal states that, like other past trends in IT innovation, Big Data has the power to “transform our lives.”

But before throwing caution to the winds and proclaiming a new era of enlightened decision making, it might be wise to heed the words of Accenture Managing Director Jeanne Harris as reported in a recent Wall Street Journal blog authored by Deborah Gage.

Using the latest storage, databases and analytical tools, corporate managers can now, at least in theory, make better, faster and more informed decisions, she (Harris) says, but she doesn’t see that happening,” Gage writes. “A quarter of managers still say they make decisions not with data, but with their “gut”–and of the ones that do claim to use data, 40% used their gut to make their last big decision. Older managers grew up in a time when data, if you could find it, wasn’t reliable, she says. ‘They grew to trust their gut, because they had no other option.’ The problem now, Harris has decided, is no longer technology—it’s a shortage of people who understand enough about math and data science to evaluate their companies’ options and speak to their managers in a language that managers understand.

In the parlance of Daniel Kahneman, the Nobel Prize winning author of Thinking, Fast and Slow, this is a problem involving System 1 and System 2 thinking. System 1 is fast, intuitive, emotional – the gut feeling mentioned above. System 2 is slower, deliberate, logical. System 1, which functions very well most of the time, can lead to spectacular mistakes.

Kahneman is recommended reading for anyone involved in the decision-making process, whether Big Data is involved or not. And that, of course, means all of us.

Read the Full Story.


Leave a comment

Advertisement


View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap