Interview: ParStream Analyzes Billions of Records in Less than a Second

As the need for real-time data analytics increases, more and more enterprises are seeking new approaches. ParStream steps up to this challenge using a hybrid in-memory storage for its database technology. We caught up with ParStream’s Co-Founder and CTO, Joerg Bienert, to learn more.

Joerg Bienert

insideBIGDATA: Joerg, please give us a little background on ParStream and how the company came about. 

Joerg Bienert: ParStream is the fastest database for Real-Time Big Data Analytics. We are 46 people and have a very experienced C++ developer team in Cologne Germany.

The motivation of building ParStream came out of a Big Data problem, that founders Mike Hummel and I were facing in 2008. We wanted to build a travel search portal, analyzing 14 billions of packaged travel offerings with a required response time of 100 ms. As there was no solution to solve this, together with Norbert Heusser, the third founder, We began to invent new algorithms that were able to analyze more than 1 billion records on 1 single server.

insideBIGDATA: What does ParStream do as far as database technology is concerned?

Joerg Bienert: ParStream is a columnar database with a hybrid in-memory storage and a shared nothing architecture. Based on patented algorithms for indexing and compressing data, Parstream uniquely combines three core features: Analyzing billions of records in sub seconds, continuous fast import with up to 1 million rec/s and a flexible, interactive analytics engine with a SQL interface.

insideBIGDATA: What does this do for those involved in Big Data and analytics?

Joerg Bienert: ParStream enables new types of applications that require Real-Time Big Data Analytics in all industries and lots of use cases like eCommerce (web-analytics, online advertising), telco (network monitoring, billing), finance (fraud prevention, risk management), science (climate research, genome analytics), industry (sensor networks, smart meetering, m2m) and many, many more.

In these areas ParStream can provide a performance, that is at least one order of magnitude faster than any other technology and uniquely combines fast querying and immediate availability of new data. It opens new ways of getting insights and values out of Big Data

insideBIGDATA: Who are your customers and prospects?

Joerg Bienert: We are focussing on e-Marketing and commerce, because data is their business and there is a huge need for big data analytics. With Parstream e.g. a European retail chain is now able to interactively browse through 50 billion rows of point of sale data, with sub second response times.

We also see a huge potential in the sensor network, M2M and internet of things space.  John Chambers, CEO of Cisco just stated that the IOT will outgrow the current internet by a factor of 6-8. You can not run such an infrastructure without Big Data Analytics in Realtime – and that is exactly what ParStream was built for.

Customers in other areas include INRA, the most important research institution for Metagenome analytics. They are using ParStream to analyse the genome information of bacteria in the intestines (100 times more info than the human DNA) and identify correlations to diseases.

insideBIGDATA: Speaking of customers–you just announced that you garnered your first US customer in marketing company, CAKE. How did this come about?

Joerg Bienert: We got in contact with CAKE after last years ad:tech conference and talked about a challenge they faced with their current infrastructure. The MS SQL servers were not able to handle millions of rows in their largest fact table.

This fact table was transferred to a ParStream database that was integrated into the existing infrastructure – in a way that queries were partially rerouted from the SQL server to ParStream.

Now CAKE is able to analyse billions of records, faster than was previously possible with the SQL server alone (only a few million).

We are very proud of the fact that ParStream was chosen after a broad evaluation of DB products – and it turned out to be 40 times faster than the closest competitor.

insideBIGDATA: What does this mean to each company in a technological sense?

Joerg Bienert: ParStream opens up new possibilities to get insight and value from Big Data. People in business and technology now can think outside of the box, considering that they can now look into billions of records with the same speed as they are used to when browsing through an excel sheet with 100 rows.

On the other hand, ParStream perfectly integrates into existing infrastructure. The software runs on standard Linux , standard HW, single server, cluster and cloud and provides standard import and SQL query interfaces. It has a small footprint and a very low TCO. And one thing that is very important for us: It is very reliable and stable, and has been run 24x 7 in production by customers for years.

insideBIGDATA: This is all pretty new and exciting stuff. What’s down the road for ParStream?

Joerg Bienert: We are very confident that ParStream will remain the fastest engine for the described use cases because there is, looking at algorithms and mathematics, no better way of analysing large datasets.

With the best developer team in Europe, we continue enhancing ParStream’s capabilities with upcoming, exciting new features.

There will be an increasing demand for Big Data analytics in Real-time, short response times and immediate availability of data, therefore we are expecting huge growth in new business.

Sign up for our insideBIGDATA Newsletter.

Resource Links: