Big Data is getting its own quarterly journal with a little help from Chief Editor Edd Dumbill.
Big Data, a highly innovative, open access peer-reviewed journal, provides a unique forum for world-class research exploring the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data, including data science, big data infrastructure and analytics, and pervasive computing. The Journal addresses questions surrounding this powerful and growing field of data science and facilitates the efforts of researchers, business managers, analysts, developers, data scientists, physicists, statisticians, infrastructure developers, academics, and policymakers to improve operations, profitability, and communications within their businesses and institutions. Spanning a broad array of disciplines focusing on novel big data technologies, policies, and innovations, the Journal brings together the community to address current challenges and enforce effective efforts to organize, store, disseminate, protect, manipulate, and, most importantly, find the most effective strategies to make this incredible amount of information work to benefit society, industry, academia, and government.
Aneel Lakhani writes that there’s been some recent moves by Cisco around big data, particularly with regards to Hadoop running on Cisco’s Nexus switches and UCS servers. Of note is the publication of an excellent paper, Big Data in the Enterprise: Network Design Considerations.
In reviewing multiple data models, this document examines the effects of Apache Hadoop as a building block for big data and its effects on the network. Hadoop is an open source software platform for building reliable, scalable clusters in a scaled-out, “shared-nothing” design model for storing, processing, and analyzing enormous volumes of data at very high performance. The information presented in this document is based on the actual network traffic patterns of the Hadoop framework and can help in the design of a scalable network with the right balance of technologies that actually contribute to the application’s network performance. Understanding the application’s traffic patterns fosters collaboration between the application and network design teams, allowing advancements in technologies that enhance application performance.
For private storage clouds to be effective, they must address several essential elements, including:
Aggregation: Resources are pooled to leverage economies of scale in capacity and performance.
Capacity on demand: Cloud solutions need to be able to scale dynamically to meet sudden increases in demand without negatively affecting performance, transparency, or ease of use for all other users.
Resource allocation: Cloud resources need to be elastic to reallocate resources among users and groups according to demand, preference, and priority.
Accounting: Organizations need tracking and prioritization mechanisms to handle the sharing of resources.
Panasas, an established provider of high-performance storage solutions for demanding HPC environments, has technology and expertise for designing solutions that address the essentials of these private storage clouds.
In this video, Panasas Chief Marketing Officer, Barbara Murphy, explains how Panasas® ActiveStor™ parallel storage appliances perfectly support private cloud implementations.
How do you build a ‘data driven’ organization? How can existing firms be transformed into data driven juggernauts? DJ Patil, former Head of Analytics and Data Teams at LinkedIN and currently a Data Science in Residence at Greylock Partners, has written a white paper outlining how analytics operations are designed, what they do, the tools they use and, finally, how to staff them. It focuses primarily on online businesses but the tools and advice apply to most any industry.
Convey Computer is out with a new whitepaper that explains how their hybrid-core architecture is well suited to Big Data and Graph Computing:
As we have reviewed, solving graph problems takes a different approach to computing. One such approach is the Convey HC (hybrid-core) family of computer systems. The Convey systems offer a balanced architecture: reconfigurable (via Field Programmable Gate Arrays—FPGAs) compute elements, and a supercomputing-inspired memory subsystem (Figure 3).Figure 3. Overview of the Convey hybrid-core computing architecture.The benefit of hybrid-core computing is that the compute-intensive kernel of the Graph500 breadth-first search is implemented in hardware on the FPGAs in the coprocessor. The FPGA implementation allows much more parallelism than a commodity system (the Convey memory subsystem allows up to 8,192 outstanding concurrent memory references). The increase in parallelism combined with the hardware implementation of the logic portions of the algorithm allow for increased overall performance with much less hardware.
Coming to SC11? This year, the Convey exhibit will include a ‘Graph Corner’ where you learn about the Graph500 benchmark and the company’s GraphConstructor. In addition, Bob Masson and Kirby Collins will present: “Heterogeneous Computing Architecture Supporting Applications in Data-intensive Sciences” on Thursday, November 17th, at 2:30 p.m. as part of the SC11 Exhibitor Forum.
In this video from Netezza, analytic workloads are explained in plain English.
While there are many analytic variants and subspecialties—predictive analytics, in-database analytics, advanced analytics, web analytics, and so on—this text focuses on the characteristic demands that nearly all analytic processing problems place on modern information systems. We refer to these demands as an analytic workload. Every data processing problem has its own unique workload, but analytic workloads tend to share a set of attributes, with strong design and deployment implications for the processing systems assigned to handle these workloads.
“IDC claims that data-intensive workloads are going to become par for the HPC course in coming years, making up a more sizable portion of the overall high performance computing market. Conway notes that “in addition, while many big data problems will be run on standard clusters, limitations in the memory sizes and memory architectures of clusters make them ill-suited for the most challenging classes of data-intensive problems.” He points to a number of HPC sites that are looking to upgrade their systems to those that have fatter memory profiles, a trend that IDC expects to see playing out in the next few years and beyond.