Interview: Carpathia Leverages Hadoop and the Cloud to Host Big Data Solutions

Ground infrastructure can be extremely expensive and time-consuming to build, implement, and maintain for any organization. Processing huge amounts of data for mission-critical applications requires scalability and elasticity. We caught up with Neeraj Sabharwal, Senior Solutions Architect of Carpathia, to learn how his company approaches these needs.

insideBIGDATA: Which industries are driving the most demand for big data solutions? 

Neeraj Sabharwal: As a cloud operator and managed hosting provider, we’re seeing demand from a wide range of companies that are interested in leveraging the scalability and efficiency benefits of the cloud to better manage, analyze, and extract value from Big Data. Mobile advertising has been a hot market for Carpathia due to the high transaction volumes and the need to scale quickly. On the desktop, marketers have traditionally relied on cookies for targeting, but mobile opens up a whole new world of social and geographic consumer information for analysis. We’re working with customers in this market to help them quickly make sense of these emerging data streams in order to deliver advertising that is customized to a person’s interests, and also to their location.

We’re also seeing significant demand within industries that have unique security and privacy requirements. Whether it’s patient treatment records in healthcare, citizen data in government, or fraud detection in financial services, companies in these markets are looking for solutions that can yield intelligence from mountains of unstructured data, while adhering to strict compliance requirements.

insideBIGDATA: How do you address compliance in markets that demand complex regulatory requirements?

Neeraj Sabharwal: Big Data platforms pose a unique challenge for security and compliance because their architecture tends to involve scale-out infrastructures that concentrate massive amounts of unprocessed data. While these unprocessed data elements may initially have little value, they can become extremely valuable once transformed and analyzed. We find that classic security principals apply equally well to these new processing platforms, focusing on the key elements of confidentially, integrity and availability.

When it comes to the data itself, we believe in scrubbing and tokenizing where possible. For the non-data geeks, these are processes that replace sensitive information with a token or generic placeholder within the environment. Tokenizing can be an extremely effective way for organizations that handle personally identifiable information (PII) or electronic protected health information (ePHI) to share and analyze datasets in a compliant manner. For example, heathcare providers often use limited data sets (LDS) that remove privacy data like facial identifiers within a health record prior to processing and analysis. But each compliance plan must be customized according to the unique standard that needs to be met, as well as the goals of the organization.

insideBIGDATA: How do you feel about deploying big data solutions in a public cloud, as opposed to hybrid or private options?

Neeraj Sabharwal: Cloud and big data make a great marriage because the partnership enables organizations to access massive data processing power when they need it, without paying for it when they don’t. But big data in the public cloud is a mixed bag. For starters, you’re sharing computing resources with other organizations, and we’ve seen significant performance issues and slow processing times when those resources get stretched. Jobs are directly impacted by the activity of your neighbors, and the data proximity also raises security concerns. Are you comfortable with your data being hosted in the same environment as your competitors? What if Coke gets access to Pepsi’s data? There are a lot of considerations.

The other issue is that public cloud infrastructure tends to be generic – built for the masses but not specifically designed to address any particular challenge. Organizations that have unique data requirements then have to customize and optimize, which is complicated and expensive. It also requires organizations to hire computer scientists to support their data scientists, which goes against the whole efficiency and simplicity value proposition that made the public cloud so attractive in the first place.

insideBIGDATA: Carpathia recently announced a Hadoop-as-a-Service solution partnership with Altiscale – what has been the market response?

Neeraj Sabharwal: The response by our customer base has been overwhelmingly positive, because the partnership delivers the best of both worlds. It’s access to the world’s premiere big data solution in Hadoop, without the infrastructure and operational costs, powered by the only cloud operator platform built for the complex requirements of large enterprises. Customers can use the solution to make sense of massive unstructured datasets while saving money in the cloud, without sacrificing on security or compliance.

insideBIGDATA: What big data trends are on the horizon in 2014 and beyond?

Neeraj Sabharwal: So many companies are focusing on delivering “3R” – the right information to the right people at the right time. That’s really the promise of big data, and there are so many smart people in our industry that will continue to make this vision a reality in 2014. Mobile holds incredible promise in terms of using data to deliver personalized, customized experiences for consumers based on preference and location. We’re also getting closer to mobile big data analysis, where smartphone apps are delivering intelligence based on enormous quantities of information. The intersection of big data and cloud will drive this evolution, with cloud delivering more processing power and more advanced analytics capabilities from any device.

I’m also really excited about our big data workforce. Harvard Business Review famously called “data scientist” the sexiest job of the 21st century, and engineers are jumping into the big data pool to meet this demand. Gartner predicts that there will be 4.4 million global big data jobs, with 1.9 million of those in the U.S. alone. All of that innovation is going to reduce complexity and costs, making big data tools more accessible to small to medium-sized organizations across every industry. It’s an extremely exciting time to be in this market.

Sign up for our insideBIGDATA Newsletter.

Resource Links: