Interview: Concurrent Leads the Way in Application Building on Hadoop

As enterprise adoption of Hadoop gains momentum, the need for fabric-agnostic application continues to rise as well. To meet this demand, Concurrent has recently announced the latest version of its application building platform, Cascading 3.0. We sat down with Gary Nakamura, CEO of Concurrent, to learn about this new product as well as other solutions from his company.

insideBIGDATA: What is the primary mission of Concurrent?

Gary Nakamura

Gary Nakamura: Concurrent, Inc. is the leader in Big Data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is also the team behind Cascading, the most widely-deployed technology for data applications with more than 150,000 user downloads a month.

insideBIGDATA: What does your company offer for enterprises seeking insight from Big Data? 

Gary Nakamura: Concurrent is the team behind Cascading, the proven application development framework that makes it possible for enterprises to leverage their existing skill sets for building data-oriented applications on Hadoop. Cascading has built-in attributes that make data application development a reliable and repeatable process. Companies that standardize on Cascading can build data applications at any scale, integrate them with existing systems, employ test-driven development practices and simplify their applications’ operational complexity.

Additionally, there is strong demand for a solution that helps enterprises to understand what their data applications are doing. Concurrent is also the team behind Driven, the industry’s first application management product for data applications. Driven significantly accelerates developer productivity and provides unprecedented visibility into developing Big Data applications on Hadoop.

insideBIGDATA: Who does this technology help?

Gary Nakamura: Cascading helps enterprises solve business problems by connecting their business strategy to their technology and data with their data applications. The technology lowers the barrier for data-oriented application development so that enterprises can leverage their existing bench skills and infrastructure to build this class of applications.

Cascading is used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, and is considered the de facto standard in open source application infrastructure technology.

insideBIGDATA: What does the Cascading 3.0 platform offer customers?

Gary Nakamura: Enterprises today are rapidly adopting Hadoop and other computation engines to process, manage and make sense of growing volumes of both unstructured and semi-structured data. At the same time, the need to rapidly and reliably build enterprise-class applications without deep knowledge of these technologies is the greatest it has ever been.

Cascading fulfills this need by allowing businesses to leverage their existing skill sets, investments and systems to build enterprise-class applications on Hadoop. With the family of Cascading applications, enterprises can apply Java, legacy SQL and predictive modeling investments, and combine the respective outputs of multiple departments into a single data processing application.

What’s new in Cascading 3.0:

  • Allows enterprises to build their data applications once, with the flexibility to run applications on the fabric that best meets their business needs.
  • Support for: local in-memory, Apache MapReduce, and Apache Tez.
  • Future support for Apache Spark™, Apache Storm and others through its new pluggable and customizable query planner.
  • Third party products, data applications, frameworks and dynamic programming languages built on Cascading will immediately benefit from this portability.
  • Compatibility with all major Hadoop vendors and service providers: Altiscale, Amazon EMR, Cloudera, Hortonworks, Intel, MapR and Qubole, among others.

insideBIGDATA: Big Data and Hadoop go hand-in-hand obviously. How did Concurrent’s relationship with the Hadoop community come about? 

Gary Nakamura: Concurrent’s open source project, Cascading, was created in response to the difficulties of application development on Hadoop. After realizing how difficult it was to create applications on raw MapReduce, our founder, Chris Wensel, decided to create an application development framework that would make it possible for enterprises to simply and reliably build data applications with their existing skill sets and infrastructure.

Over the years, Cascading has become the proven enterprise application development framework for organizations to standardize on. The community around Cascading has evolved to the point where there are now several well-known dynamic programming languages built on top of Cascading (i.e. Scalding, Cascalog) as well as integrations produced by the community that extend Cascading’s capabilities. Also, based on customer demand, Concurrent has strong partner relationships in the ecosystem. All major Hadoop distribution partners make sure that their distribution is compatible with Cascading.

insideBIGDATA: Is Cascading 3.0 fabric-specific or is MapReduce the model of choice?

Gary Nakamura: Cascading 3.0 is fabric agnostic. Enterprises can make their choice of execution fabric based on the needs of their business. Cascading 3.0 is designed to work with various fabrics, whether it’s MapReduce or new and emerging fabrics such as Tez. Support is also planned for Spark and Storm. Cascading gives its users the choice on which fabric is best to use in order to meet business requirements. This means you can develop once and port to various fabrics, without the need to rewrite.

insideBIGDATA: As more and more and organizations adopt Hadoop, what is Concurrent doing to keep pace? In other words, what does the future hold?

Gary Nakamura: Our roadmaps are heavily influenced by customer demand and our aim is to be a few steps ahead of our users. With Cascading, we’re focused on solving the problem of enterprises operationalizing their data. At this point, we are seeing that organizations require emerging execution fabrics to meet a variety of business requirements (i.e. latency, scale, service level agreements). To meet customer demand, we are adding support for Apache Tez in Cascading 3.0. In future releases, Cascading will support Spark and Storm as well.

The next critical problem our customers are seeing is operational visibility for their data applications. Hadoop is becoming the operational center for enterprises and their data applications, the place where they’re looking to build hundreds of mission-critical data products. Driven is the first product in the industry that provides the needed operational visibility required from enterprises. With Driven, enterprises will be able to immediately understand what their data applications are doing in real-time. Driven accelerates the time to market for data products by providing capabilities for developers to visualize their data application, immediately diagnose failures, and optimize for application performance.

Resource Links: