Discovering Gold with Big Data Analytics and Data-Intensive Computing

Entries filed under “Research”

Gary King on Why Big Data is Not About The Data!

In this video from the NEAI Meetup Series, Gary King from Harvard presents: Big Data is Not About The Data!.

King’s work is widely read across scholarly fields and beyond academia. He was listed as the most cited political scientist of his cohort; among the group of “political scientists who have made the most important theoretical contributions” to the discipline “from its beginnings in the late-19th century to the present”; and on ISI’s list of the most highly cited researchers across the social sciences. His work on legislative redistricting has been used in most American states by legislators, judges, lawyers, political parties, minority groups, and private citizens, as well as the U.S. Supreme Court. His work on inferring individual behavior from aggregate data has been used in as many states by these groups, and in many other practical contexts. His contribution to methods for achieving cross-cultural comparability in survey research have been used in surveys in over eighty countries by researchers, governments, and private concerns. King led an evaluation of the Mexican universal health insurance program, which includes the largest randomized health policy experiment to date. The statistical methods and software he developed are used extensively in academia, government, consulting, and private industry. He is a founder, and an inventor of the original technology for, Crimson Hexagon, Learning Catalytics, and other firms.


Also posted in Analytics | Leave a comment

Is All Science Becoming Data Science?

Over at Science Magazine, Vijaysree Venkatraman writes that data-driven discovery may soon become the norm in science and that learning to code and becoming comfortable with large datasets may soon be a necessity in many traditional scientific fields.

All science is fast becoming what is called data science,” says Bill Howe of UW’s eScience Institute. Today, there are sensors in gene sequencers, telescopes, forest canopies, roads, bridges, buildings, and point-of-sale terminals. Every ant in a colony can be tagged. The challenge is to extract knowledge from this vast quantity of data and transform it into something of value. Lately, Lazowska says, he has been hearing this refrain from researchers in engineering, the sciences, the social sciences, law, medicine, and even the humanities: “I am drowning in data and need help analyzing and managing it.”

Read the Full Story.


Also posted in Analytics, Education | Leave a comment

Indiana University Helps NASA Manage Big Data

Indiana University has contributed Big Data expertise and infrastructure to NASA’s Operation IceBridge, a decade-long polar ice monitoring project.

For the past four years, IU Research Technologies, a cyberinfrastructure and service center affiliated with the Pervasive Technology Institute (PTI), has provided IT support for the Center for Remote Sensing of Ice Sheets (CReSIS), a National Science Foundation Science and Technology Center led by the University of Kansas. Kansas scientists provide NASA with the radar technology that measures the physical interactions of polar ice sheets in Greenland, Chile and Antarctica. IU experts bring innovative data management and storage solutions to the missions.

Essentially, IU has built a supercomputer that can fly,” said Rich Knepper, manager of IU’s campus bridging and research infrastructure team within Research Technologies. “During this current mission, our system provided analysis of radar data as the data was collected – in real time — allowing mission scientists to see the ice bed information as the plane flies over the Arctic.”

Read the Full Story.


Also posted in Public sector, Storage | Leave a comment

Big Data Upending H.R. Conventional Wisdom

For decades human resource management has been a field that relied on policies and procedures, leavened by healthy doses of intuition. But Big Data is standing H.R. on its head – many of the old assumptions are either being called into question or totally discarded.

Writing in The New York Times, Steve Lohr chronicles a number of these cherished beliefs that are being undermined by the results of recent research based on Big Data.

For example, a good supervisor – one who is an excellent communicator with a warm personality – may be more important to an organization’s success than the experience and attributes of the workers themselves.

And when it comes to hiring, data also shows that the tendency of H.R. departments to avoid candidates with a history of job-hopping or who have been unemployed for some time is the wrong tack to take. These factors are not good predictors of future results.

Even the idea that the ideal salesperson’s most important asset is an outgoing, optimistic personality fails to hold up.  Research by IBM’s Kenexa unit – a recruiting, hiring and training company acquired last year – reveals that successful salesmen exhibit “a kind of emotional courage, a persistence to keep going even after initially being told no.”

You would expect Google to be at the forefront of using Big Data to manage its H.R. activities.  And, indeed, this is the case.

Google, not surprisingly, is committed to applying data-driven decision-making to human resource management,” writes Lohr. “For years, candidates were screened according to SAT scores and college grade-point averages, metrics favored by its founders. But numbers and grades alone did not prove to spell success at Google and are no longer used as important hiring criteria, says Prasad Setty, vice president for people analytics. Since 2007, the company has conducted extensive surveys of its work force. Google has found that the most innovative workers — also the ‘happiest,’ by its definition — are those who have a strong sense of mission about their work and who also feel that they have much personal autonomy. ‘Our people decisions are no less important than our product decisions,’ Mr. Setty says. ‘And we’re trying to apply the same rigor to the people side as to the engineering side.’”

The impact of Big Data on H.R. is expected to be profound.  Lohr quotes Peter Capelli, director of the Center for Human Resources at the Wharton School of the University of Pennsylvania who says, “This is absolutely the way forward.  Most companies have been flying completely blind.”

Read the Full Story.


Also posted in Business of Big Data | Leave a comment

Video: How CERN Handles Big Data from the LHC

This video looks at how Big Data is handled from the Large Hadron Collider (LHC).

The LHC produces millions of collisions every second in each detector, generating approximately one petabyte of data per second. None of today’s computing systems are capable of recording such rates, so sophisticated selection systems are used for a first fast electronic pre-selection, only passing one out of 10,000 events. Tens of thousands of processor cores then select 1% of the remaining events for analysis. Even after such drastic data reduction, the four big experiments, ALICE, ATLAS, CMS and LHCb, together need to store over 25 petabytes per year. The LHC data are aggregated in the CERN Data Centre, which performs initial data reconstruction is performed, and a copy is archived to long-term tape storage. Another copy is sent to several large data centres around the world.


Also posted in Analytics, HPC, Video | Leave a comment

Big Data Renaissance in North Carolina

In Chapel Hill, there are big doings in Big Data these days.

A new collaboration, known as the National Consortium for Data Science (NCDS) has been launched at RENCI, (Renaissance Computing Institute) at the University of North Carolina.

The consortium has ambitious goals according to a recent RENCI press release, which states that NCDS “…aims to make North Carolina a national hub for data-intensive business and data science research and education, a move that will help develop a national strategy to ensure U.S. leadership in the data-driven global economy.”

NCDS founding members include industry leaders, top research universities, and nonprofit and government organizations.

Representing the private sector are Cisco, GE, IBM, NetApp and SAS. UNC Chapel Hill, RENCI, North Carolina State University, UNC Charlotte, UNC General Administration, Duke University and Drexel University comprise the founding academic members.  Nonprofit and government sector members are The Hamner Institutes for Health Sciences, MCNC, the National Institute of Environmental Health Sciences, RTI International and the U.S. Environmental Protection Agency. All founding members have major facilities in North Carolina except Drexel, located in Philadelphia.

Those who harness the power of big data and use it to develop new data-intensive business sectors will be the winners in the 21st century economy,” said Stanley C. Ahalt, Ph.D., director of RENCI and a chief organizer of the NCDS, and professor of computer science at UNC-Chapel Hill. “Our members understand that, want to find solutions to big data problems and put North Carolina on the map as a center of data science innovation.”

The consortium is seeking to quickly consolidate its position as a major player in the evolving world of Big Data through a series of on-going initiatives.

For example, NCMS will be competing for federal research funding including the $200 million Big Data Research and Development Initiative announced by the White House last year, the National Science Foundation’s BigData program, and the National Institutes of Health’s Big Data to Knowledge initiative.

It will convene its first invitation-only NCDS Leadership Summit April 23-24 in Chapel Hill.  This year the summit will focus on genomics. Other initiatives include public lectures, an internship program for students at member companies, and visiting scientists’ positions at member universities for industry employees.

It is no longer enough for businesses to be big in order to be successful, now success is driven by the amount of knowledge a company possesses,” said David Turek, vice president, exascale computing, IBM. “Ninety percent of our planet’s data has been created within the past two years, and the demand will grow as businesses look to optimize big data analytics to improve decision making and expand their business operations into cloud, social and mobile environments.”

Read the Full Story.


Leave a comment

Obama Initiative Leverages Big Data to Explore the Brain

On Tuesday, April 2, President Obama announced a research initiative that has the ambitious goal of “revolutionizing our understanding of the human brain,” according to a White House press release.

Know as BRAIN (Brain Research through Advancing Innovative Neurotechnologies), the initiative is being launched in FY 2014 with an initial budget of about $100 million, a modest amount given the project’s goals.

In short, BRAIN is designed to help researchers find “…new ways to treat, cure, and even prevent brain disorders, such as Alzheimer’s disease, epilepsy, and traumatic brain injury.” Included is support for new technologies that will allow researchers to produce dynamic pictures of the brain that show how individual brain cells and complex neural circuits interact in real time.

This is a foray into Big Data. The initiative will let researchers amass and analyze the data needed to “…explore how the brain records, processes, uses, stores, and retrieves vast quantities of information, and shed light on the complex links between brain function and behavior.”

Among the many public and private organizations involved in the effort are the National Institutes of Health (NIH), the Defense Advanced Research Projects Agency (DARPA), and the National Science Foundation (NSF). NSF in particular is leading the charge in applying the technologies and techniques of Big Data to the initiative.

The National Science Foundation will play an important role in the BRAIN Initiative because of its ability to support research that spans biology, the physical sciences, engineering, computer science, and the social and behavioral sciences,” according to the White House release. “The National Science Foundation intends to support approximately $20 million in FY 2014 in research that will advance this initiative, such as the development of molecular-scale probes that can sense and record the activity of neural networks; advances in ‘Big Data’ that are necessary to analyze the huge amounts of information that will be generated, and increased understanding of how thoughts, emotions, actions, and memories are represented in the brain.”

In a story in Information Week posted the same day, senior editor J. Nicholas Hoover, writes, “On a conference call with reporters after the President’s announcement, National Institutes of Health director Francis Collins said that the brain-mapping initiative might eventually require the handling of yottabytes of data. A yottabyte is equal to a billion petabytes.”

That’s Big Data at its mind-boggling best.

Read the Full Story.


Also posted in Healthcare, Life Sciences, Public sector, Video | Leave a comment

You and Your Cellphone: Doing Your Part for Big Data

That cellphone in your pocket or purse is generating data that, for better or worse, can be used for a variety of applications – everything from urban planning to tracking your whereabouts.

Larry Hardesty, writing in a release issued by the MIT News Office, comments that today’s sensor-studded cellphones can be used for a variety of socially useful applications such as epidemiology, operations research and emergency preparedness, just to name a few.

So far, so good. But here’s the catch – before releasing the data to researchers in these fields, information identifying the individual user needs to be removed. Asks Hardesty, “…how hard could it be to protect the identity of one unnamed cellphone user in a data set of hundreds of thousands or even millions.”

Turns out assuring that level of privacy is very hard indeed.

According to a paper appearing this week in Scientific Reports, harder than you might think,” Hardesty writes. “Researchers at MIT and the Université Catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them. In other words, to extract the complete location information for a single person from an ‘anonymized’ data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person’s whereabouts.”

The Scientific Reports paper speculate that the concepts behind tracking people’s movements using cellphone data might apply to other kinds of data as well – for example web browsing. As César Hidalgo, one of the paper’s authors comments, “The space of potential combinations is really large. When a person is, in some sense, being expressed in a space in which the total number of combinations is huge, the probability that two people would have the same exact trajectory — whether it’s walking or browsing — is almost nil.”

Read the Full Story.


Also posted in Analytics, Privacy | Leave a comment

Video: EUDAT and Big Data in Science

In this video from the 2013 National HPCC Conference, Wolfgang Gentzsch presents: EUDAT and Big Data in Science.

Big data science emerges as a new paradigm for scientific discovery that reflects the increasing value of observational, experimental and computer-generated data in virtually all domains, from physics to the humanities and social sciences. Addressing this new paradigm, the EUDAT project is a European data initiative that brings together a unique consortium of 25 partners — including research communities, national data and high performance computing (HPC) centers, technology providers, and funding agencies — from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data. The design and deployment of these services is being coordinated by multi-disciplinary task forces comprising representatives from research communities and data centers.”


Also posted in Events, HPC, Video | Leave a comment

Big Data Freeway Under Construction in San Diego

If you’ve ever driven the freeways of Southern California, you might wonder about the metaphor chosen to describe the new high speed, Big Data network announced this week by the University of California, San Diego.

Known as the Prism@UCSD project, the university is building a high performance cyberinfrastructure to support bursts of Big Data between campus facilities housing diverse disciplines – such as science, engineering, medicine and the arts – without killing the main campus network.

With $500,000 in funding from the National Science Foundation (NSF), the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2) is developing Prism specifically to support researchers in such data-intensive scientific areas as genomic sequencing, climate science, electron microscopy, oceanography and physics.

We’ve identified a variety of big data users on this campus who need ten gigabit/s and faster bandwidth to deal with the avalanche of data coming from scientific instruments such as sequencers, microscopes and computing clusters,” said Philip Papadopoulos, principal investigator on the Prism@UCSD project, who splits his time between Calit2 and the university’s San Diego Supercomputer Center (SDSC). “We’re starting at 1 Terabit/s of connected capacity through our next-generation modular switch, which is at the center of the Prism network. It can carry 20 times the traffic of our current research network, and it’s 100 times the bandwidth of the main campus network.”

Adds Papadopoulos, “You can think of Prism as the HOV lane, whereas our very capable campus network represents the slower lanes on the freeway.” Let’s hope he’s talking about the freeway at three in the morning.

Prism@UCSD is a response to the growing challenge of Big Data,” said Calit2 Director Larry Smarr. “The key innovation in Prism@UCSD is to provide end-to-end dedicated large bandwidth to the end-users on campus.”

And he too invokes the freeway metaphor: “The Prism Big Data network also creates a high-capacity ‘data freeway’ to campus, national or international networks,” adds Smarr.

A roadway that has an aggregate bandwidth equivalent to over one terabit per second could go a long way to clearing up Southern California’s traffic problems.

Read the Full Story.


Also posted in Education, Hardware, I/O, Network | Leave a comment

View All Videos

inside-bigdata.com is a production of insideHPC, LLC. © 2011-2013 Sitemap