In-Memory Database vs. In-Memory Data Grid

Nikita Ivanov, CTO of GridGain

Nikita Ivanov, CTO of GridGain

In-memory computing is comprised of two main categories: In-Memory Databases and In-Memory Data Grids. I’d like to delve into the differences between the two groups.

Nomenclature

While an In-Memory Database (IMDB) is a well-recognized term that is typically used explicitly, it’s important to highlight the new crop of traditional databases with serious in-memory “options”: MS SQL 2014, Oracle’s Exalytics and Exadata, and IBM DB2 with BLU offerings. For simplicity’s sake, I’ll generalize them as “in-memory databases.”

On the other hand, “in-memory data grids” (IMDGs), infrequently referred to as in-memory NoSQL/NewSQL databases, will fall under the umbrella of “in-memory data grid” for this article.

For clarity and consistency, we’ll focus on these two categories: IMDB and IMDG.

Tiered Storage

Before moving forward, let’s clarify what we mean by “in-memory”. Although some vendors refer to SSDs, Flash-on-PCI, Memory Channel Storage, and DRAM as “in-memory”, in reality, most vendors support a tiered storage model where part of the data is stored in DRAM, which then gets overflown to a variety of flash or disk devices. Therefore it is rarely a DRAM-only, Flash-only or disk-only product. However, it’s important to note that most products in both categories are often biased towards mostly DRAM or mostly flash/disk storage in their architecture.

The main point to take away is that “in-memory” products are not confined to one fixed definition, but in the end they all have a significant “in-memory” component.

Technical Differences

The majority of IMDBs are RDBMS that store data “in memory” instead of disk. With only a modest list of unsupported SQL features, IMDBs generally provide good SQL support. IMDBs are also shipped with ODBC/JDBC drivers and can be used in place of existing RDBMS often without significant changes.

Conversely, In-Memory Data Grids typically lack full ANSI SQL support; but instead provide MPP-based (Massively Parallel Processing) capabilities where data is processed in parallel fashion and spread across a large cluster of commodity servers. The main access pattern is key/value access, MapReduce, several forms of HPC-like processing, and restricted distributed SQL querying and indexing capabilities.

Please note that IMDGs and IMDBs have significant crossover in terms of SQL support. For example, GridGain provides dependable, ever-growing support for SQL: pluggable indexing, distributed joins optimization, custom SQL functions, etc.

Speed Only vs. Speed + Scalability

One of the key differences between IMDGs and IMDBs is in the level of scalability allotted. Due to IMDG’s MPP (Massively Parallel Processing) architecture, it has an inherent capability to scale to hundreds and thousands of servers; on the other hand, due to the fact that SQL joins cannot be efficiently performed in a distribution context, IMDBs are explicitly unable to scale horizontally.

This is the dilemma with in-memory databases – SQL joins, one of IMDB’s most useful features, is also what limits scalability. This accounts for why most existing SQL databases (disk or memory based) are based on vertically scalable Symmetrical Processing architecture, unlike IMDGs that utilize the much more horizontally scalable MPP approach.

Please note that while both IMDGs and IMDBs have comparable speeds in a local, non-distributed context, only IMDGs can natively scale to hundreds and thousands of nodes – providing extraordinary scalability and unrivaled throughput.

Replace Database vs. Change Application

In addition to scalability, another key differentiation between IMDGs and IMDBs involves switching out existing databases versus altering the application.

IMDGs always work with an existing database, providing a layer of massively distributed in-memory storage and processing between the database and the application, which can then be relied on for rapid data access and processing. Most IMDGs are highly integrated with existing databases, and can seamlessly read-through and write-through to and from databases when necessary.

In these instances, developers must modify the application in order to take advantage of these new capabilities. With the application no longer “talking” SQL only, it needs to learn how to use MPP, MapReduce or other techniques for data processing.

On the contrary, with IMDBs, replacing existing databases is often required (unless you use one of those in-memory “options” to temporarily boost your database performance), but it will require considerably fewer changes to the application itself due to its continued reliance on SQL (albeit a modified dialect of it).

In the end, there are advantages and disadvantages to both approaches, depending on organizational policies and politics, as well as their technical merits.

My advice to businesses developing a green-field, brand new system or application is to go with In-Memory Data Grids. With IMDGs, you get to work with existing databases in your organization where necessary, while benefitting from incredible performance and scalability – both of which are highly integrated.

However, if you’re renovating your existing enterprise system or application, my recommendation is this:

Use an IMDB if: replacing or upgrading existing disk-based RDBMS is desirable; you’re unable to make changes to your applications; and when speed is necessary, but scalability isn’t. With this option, businesses are able to boost application speed by replacing or upgrading RDBMS without having to adjust the application itself.

On the other hand, use an IMDG if: replacing existing disk-based RDBMS isn’t possible; you’re able to make changes to your application; and when both speed and scalability are necessary. With an IMDG, application’s speed can be enhanced while providing massive scale by tweaking the application without making changes to your existing database.

It can be summarized using the following table:

In-Memory Data Grid

In-Memory Database

Existing Application  Changed Unchanged
Existing RDBMS  Unchanged Changed or Replaced
Speed  Yes Yes
Max. Scalability  Yes No

 

Sign up for the free insideBIGDATA newsletter.

Comments

  1. Are these definitions sufficiently meaningful? For example DB2 with database partitioning feature can scale up to 1000 nodes and with enough bufferpool memory would meet your definition of an in memory data grid. However most people would see the DB2 with BLU architecture as much more appropriate to most analytics requirements, with a lot more bang per hardware buck. The key features here are not just lots of memory or SSD but also columnar storage, high compression ratios, the ability to evaluate data while still compressed, SIMD and multi core exploitation plus a lot of attention to IO efficiency (data skipping). Many of the products you describe lack most of these features.

Resource Links: