Sign up for our newsletter and get the latest big data news and analysis.

New MIT Software Targets Data-Intensive Cloud Computing

When data-intensive applications meet the cloud, there may be stormy weather ahead.

Cloud computing services undeniably generate a long list of benefits: for example, economies of scale, responsiveness to fluctuating job requirements, in-depth technical support, and the pay-as-you-go scenario come to mind. But researchers at MIT are also aware that applications built around large-scale database queries can cause havoc in the cloud.

Cloud services often partition their servers into virtual machines. Each of these machines is constrained in a number of ways: for example, they may be assigned a finite number of operations per second on the server’s CPU, or allocated a limited amount of space in memory. According to MIT, that makes for easier management of the cloud servers, but it also can result in an allocation of about 20 times more hardware than is necessary to do the job. Naturally the cost of this overprovisioning gets passed on to the customer.

This has prompted MIT researchers to begin work on a new system called DBSeer. According to a recent press release, the software uses machine-learning techniques to build accurate models of performance and resource demands of database-driven applications.

The new algorithm at the heart of DBSeer has been released under an open-source license. Teradata, one of the leaders in the Big Data revolution, is already in the process of importing the algorithm into its solutions.

“With virtual machines, server resources must be allocated according to an application’s peak demand,” explains Barzan Mozafari, one of the MIT researchers. “You’re not going to hit your peak load all the time. So that means that these resources are going to be underutilized most of the time. Provisioning for peak demand is largely guesswork. It’s very counterintuitive, but you might take on certain types of extra load that might help your overall performance. Increased demand means that a database server will store more of its frequently used data in its high-speed memory, which can help it process requests more quickly.

However, a slight increase in demand could cause the system to slow down precipitously – if, for instance, too many requests require modification of the same pieces of data, which need to be updated on multiple servers. “It’s extremely nonlinear,” Mozafari says.

The MIT team has built a DBSeer model of MySQL and they are currently working on a new model for PostgreSQL – both widely used database systems.

Read the Full Story.

Resource Links: