Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Research”

Big Data – Quality or Quantity?


In this special guest feature, Anchita Magan from [x]cube DATA writes that the element of quality has to be considered in quantifiable data.

Significance of Big Data

The entire cosmos has been turned into an aggregated ocean of Data – structured or unstructured, systematic or unsystematic, useful or useless. This zillion of roughly organized data is needed to be stored, arranged and analyzed so that it can be brought to use by the business houses to evaluate the dimensions of their success as well as their bottlenecks. Whether a CEO or a COO, a Marketing manager or an operations head, an HR Employee or an IT engineer, they all make use of big data analysis for decision making.
 But the valid question that arises is ‘which attribute of big data is more important – Quality or Quantity?’

Importance of Quantity and Relevance of Quality

The term’ big’ itself is closely related with quantity. But extracting qualitative and fruitful data out of the bulk is the important task which is needed to be accomplished for sustainable growth, effective utilization of resources and to answer the present and foreseeable challenges.

Experts says that we analyze only one percent of Data and hence can tap only 1 percent of its potential. But through systematic data analysis of the rest 99% of data, a revolution can be brought in all the sectors of business era – be it retail, healthcare, telecom, financial services or IT.

But it is also observed that without valid evaluation, collecting hoards of data won’t provide the necessary insights into the business.

Application of Big Data in Health Care Industry with reference to Quality and Quantity

With the boom in Internet and communication technology, big data analysis has gained a lot of significance at the vast global stage. It generate insights on the business performance as a whole by evaluating both the internal and external data collected worldwide.
According to a report by McKinsey five areas with maximum big data potential are health care, retail sector, manufacturing industry, public sector and personal location data.

Taking healthcare industry into consideration, which is currently facing major challenges making their services affordable and accessible to all sections of the society and to the remotest of locations. It has been observed that there is an extensive use of health information and health care data which is processed and analyzed to plan, determine and administer the quality of health services and scientific research for major breakthroughs in the fields of diagnosis and medication. The government as well as private organizations provide multiple statistical reports which throw light on the administrative data regarding the expenditure, consumption and utilization of health services, keeping in account the patient’s records, lab records, number of hospitals, bed utilization rates, out-patient visits, occupancy rates, human resources, etc.

This structured and unstructured data can be a guiding light only when it is properly categorized, processed and analyzed to extract the fruitful insights and discarding the useless content, thus turning the quantitative data into a qualitative one. This is achieved through techniques of big data analytics which is a key to the dynamic potential capability of an organization. These big data techniques include text data mining, machine learning and statistical programming, which are backed by widely used technologies like NoSQL databases and Hadoop Framework.

These technologies of big data analysis further helps to control fraud by enabling the auditors to identify the transactions that indicate the activities of artifice or treachery and thus strengthening the anti fraud mechanism of hospitals.

Some applications of big data in healthcare are:

  • By combining the most advanced laboratory diagnostics, imaging systems and healthcare information technology, Healthcare Industry enables clinicians to diagnose disease earlier and more accurately, making a decisive contribution to improving the quality of healthcare
  • The Healthcare big data technology management offers solutions for the entire supply chain under one roof – from prevention and early detection through diagnosis and on to treatment and aftercare.
  • Big data analytics attempts to examine large amount of data emanating from a variety of sources to discover patterns that could be useful in problem solving and decision making.

The best example can be Bumrungrad International hospitals which are effectively using the clinical analytics and Electronic Medical Record (EMR) to deliver better care for its patients, to analyze their needs and to enhance the patients’ satisfaction along with making their service cost effective. The hospital manages patient information utilizing an integrated hospital information system that uses digital radiology systems. A case study by Intel Corporation unveiled that Bumrungrad commissioned the development of a custom total hospital information system to service both the front office and back office, to maximize both safety and efficiency as well as to drastically reduce the potential for medication error.

Thus it is important to understand that the huge amount of big data has to be well examined, reviewed and verified to deduce the useful content; hence adding quality to the quantifiable data.

About the Author

This article was written by Anchita Magan from [x]cube DATA. [x]cube DATA provides big data solutions and services to companies across various industries that wish to harness the large data sets at their disposal and gain actionable insights from it.


Also posted in Analytics, Healthcare, Life Sciences | Leave a comment

New MIT Software Targets Data-Intensive Cloud Computing

When data-intensive applications meet the cloud, there may be stormy weather ahead.

Cloud computing services undeniably generate a long list of benefits: for example, economies of scale, responsiveness to fluctuating job requirements, in-depth technical support, and the pay-as-you-go scenario come to mind. But researchers at MIT are also aware that applications built around large-scale database queries can cause havoc in the cloud.

Cloud services often partition their servers into virtual machines. Each of these machines is constrained in a number of ways: for example, they may be assigned a finite number of operations per second on the server’s CPU, or allocated a limited amount of space in memory. According to MIT, that makes for easier management of the cloud servers, but it also can result in an allocation of about 20 times more hardware than is necessary to do the job. Naturally the cost of this overprovisioning gets passed on to the customer.

This has prompted MIT researchers to begin work on a new system called DBSeer. According to a recent press release, the software uses machine-learning techniques to build accurate models of performance and resource demands of database-driven applications.

The new algorithm at the heart of DBSeer has been released under an open-source license. Teradata, one of the leaders in the Big Data revolution, is already in the process of importing the algorithm into its solutions.

“With virtual machines, server resources must be allocated according to an application’s peak demand,” explains Barzan Mozafari, one of the MIT researchers. “You’re not going to hit your peak load all the time. So that means that these resources are going to be underutilized most of the time. Provisioning for peak demand is largely guesswork. It’s very counterintuitive, but you might take on certain types of extra load that might help your overall performance. Increased demand means that a database server will store more of its frequently used data in its high-speed memory, which can help it process requests more quickly.

However, a slight increase in demand could cause the system to slow down precipitously – if, for instance, too many requests require modification of the same pieces of data, which need to be updated on multiple servers. “It’s extremely nonlinear,” Mozafari says.

The MIT team has built a DBSeer model of MySQL and they are currently working on a new model for PostgreSQL – both widely used database systems.

Read the Full Story.


Also posted in Analytics, Cloud | Leave a comment

Life Sciences and Big Data: Made for Each Other

You can’t find better examples of the promises and pitfalls of Big Data than in the realm of the life sciences. According to an excellent article, “Unraveling the Complexities of Life Sciences Data,” published in the journal Big Data, with the combination of the completion of the human genome project and the availability of advanced analysis technologies, “the 21st-century life sciences have entered the fourth paradigm of data-enabled sciences and the realm of big data.”

The authors, Roger Higdon, et. al., note that the scale of biological data is increasing exponentially. Sequencing technologies are producing data faster than the growth of computing power predicted by Moore’s Law – a 10,000-fold increase in sequencing as compared to a 16-fold uptick in computational power. This brings the life sciences squarely up against the challenges of the “5 Vs” of big data: volume, veracity, velocity, variety, and value.

The paper details the specific problems facing life sciences researchers and presents a series of solutions for handling the field’s treasure trove of complex life sciences data. Included is an integrated data resource developed by the Kolker Lab; an efficient way to functionally annotate newly sequence genomes and metagenomes; all-versus-all sequence alignments; a platform under development for visualizing complex data; and a plan for community outreach and education in the data-enabled sciences.

It is clear that the life sciences have become big data and data-enabled science,” the authors conclude. “Data-enabled science may have at its core the generation of data in the lab, but transforming the data to knowledge and action to breakthroughs and benefits goes far beyond the lab. The transformation will require massive resources and transdisciplinary collaborative efforts put forth by the scientific community to solve the challenges of big data. The need is urgent and growing, given the issues of data generation outstripping computing power and the lack of reproducibility of research. Organizations like DELSA Global can inform the life sciences community, lead the way for groups like the Kolker Lab to put forth new solutions to big data challenges, and create a new paradigm in the life sciences of cooperation, collaboration, and sharing at every level.

Read the Full Story.


Also posted in Life Sciences | Leave a comment

Video: Algorithmic Illusions – Hidden Biases of Big Data

In this video, Kate Crawford from Microsoft Research presents: Algorithmic Illusions: Hidden Biases of Big Data.

Big data gives us a powerful new way to see patterns in information – but what can’t we see? When does big data not tell us the whole story? This talk opens up the question of the biases we bring to Big Data, and how we might work beyond them.

This talk was recorded at the 2013 Strata Conference in Santa Clara.


Also posted in Analytics, Events, Video | Leave a comment

How Big Data Helps Scientists Ask Bigger Questions

You have to wonder what Albert Einstein would be up to in this era of big data. Writing in Quartz, Gartner analyst Chris Guan notes that in 1905, Einstein, working with just a handful of data points, discovered that light was made up of particles – a breakthrough that completely changed the course of physics.

A few decades later, Erwin Schrödinger derived an equation that explained many of the new ideas in the fledgling field of quantum mechanics; but the processing power to solve the equation wasn’t available at the time.

Today, with affordable supercomputers, cloud computing, and the ability to move massive amounts of data with Hadoop, scientists have the processing power they need to solve even the most intractable problems. Guan cites the example of a University of Wisconsin researcher who created a massive database of stem cells using over a million processing hours. He finished his study in a week for less than $20,000.

In computer science there are two laws, Amdahl’s and Gustafson’s. Amdahl’s shows how much faster a given problem is answered when more processing power is thrown at it,” says Guan. Gustafon’s turns Amdahl’s on its head by defining how big of a problem can be answered, given a fixed amount of time, when more resources are available. In other words, given an hour, what can be solved with more computers vs. with less. Science, like Gustafson suggested, often opts for bigger questions vs. saved time. The ability to translate larger amounts of data into cogent explanations can help remove old barriers from scientific pursuits. In genetic research, that could mean unlocking the causes of diseases, developing new cures, and finding the parts of the genetic blueprint that make us human.”

Read the Full Story.


Also posted in Analytics, Life Sciences | Leave a comment

With an Eye on Big Data, HPAC Workshop Returns to Lugano March 13-15

The HPC Advisory Council Switzerland Conference returns to Lugano March 13-15, 2013.

The conference will focus on the following topics: Progress of Exascale in the European Union, high-performance interconnects, Accelerators and Parallel I/O, communication libraries (MPI, SHMEM, PGAS), GPU computing (CUDA, OpenCL) Big Data, advanced topics / technologies / development including server and storage systems, and hands-on clustering, network, troubleshooting, tuning, optimizations. The conference is open to the public and will bring together system managers, researchers, developers, computational scientists and industry affiliates.

Having been to this event several times, I can tell you that Lugano is one of the most beautiful towns in the world. It’s a solid three-day workshop, and this year they’ll be treating attendees to a boat trip on lake Lugano with an on-board apero and dinner.

Check out the Preliminary Agenda and Register now.


Also posted in Events, File Systems, HPC, Lustre, Software, Storage | Leave a comment

DARPA Boosts Python with $3 Million Award to Continuum Analytics

Over at IT World, Joab Jackson writes that Python just got a big data boost from DARPA with a $3 million award to software provider Continuum Analytics. The funding will help foster the development of Python’s data processing and visualization capabilities for big data jobs.

The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. More mathematically centered languages such as the R Statistical language might seem better suited for big-data number crunching, but Python offers an advantage of being easy to learn.

The work is part of DARPA’s XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data. Read the Full Story.

In this video from PyData NYC 2012, Stephen Diehl from Continuum Analytics presents on Blaze, a next-generation NumPy designed as a foundational set of abstractions on which to build out-of-core and distributed algorithms. Blaze generalizes many of the ideas found in popular PyData projects such as Numpy, Pandas, and Theano into one generalized data-structure. Together with a powerful array-oriented virtual machine and run-time, Blaze will be capable of performing efficient linear algebra and indexing operations on top of a wide variety of data backends.


Also posted in Software | Leave a comment

Video: CyberGIS’12: Panel Discussion

In this video from the First International Conference on Space, Time and CyberGIS, panelists discuss lessons learned over the course of the event.

Spatial dynamics and cyberGIS ask fundamental questions about the complexity, dynamics, and synthesis of social, behavioral, and economic systems. Making connections across space and time enables knowledge building beyond disciplinary boundaries to understand how new findings in one discipline relate to another for a holistic understanding of human dimensions.

This is fascinating material to me, and one of the panelists poses a very interesting question: “What happens when a science has plenty of data, but lacks theory?”


Also posted in Analytics, Video | Leave a comment

Video: Data Challenges for the Analysis of Human Space-Time Behavior

In this video from the First International Conference on Space, Time and CyberGIS, Mei-Po Kwan from the University of California – Berkeley presents: Data Challenges for the Analysis of Human Space-Time Behavior.


Also posted in Analytics, Events, Video | Leave a comment

IBM Lights up Silicon Chips to Tackle Big Data

IBM has announced a major advance in the ability to use light instead of electrical signals to transmit information for future computing.

The breakthrough technology – silicon nanophotonics – allows the integration of different optical components side-by-side with electrical circuits on a single silicon chip – using, for the first time, sub-100nm semiconductor technology.

Silicon nanophotonics takes advantage of pulses of light for communication and provides a super highway for large volumes of data to move at rapid speeds between computer chips in servers, large data centers, and supercomputers, thus alleviating the limitations of congested data traffic and high-cost traditional interconnects.

This technology breakthrough is a result of more than a decade of pioneering research at IBM,” said John Kelly, senior vice president and director of IBM Research. “This allows us to move silicon nanophotonics technology into a real-world manufacturing environment that will have impact across a range of applications.”

The amount of data being created and transmitted over enterprise networks continues to grow due to an explosion of new applications and services. Silicon nanophotonics, now primed for commercial development, can enable the industry to keep pace with increasing demands in chip performance and computing power.

Read the Full Story or Download the Whitepaper (PDF).

This story appears here as part of a cross-publishing agreement with Scientific Computing World.


Also posted in Hardware, HPC | Leave a comment

Advertisement

ClusterStor Ad

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap