In this video from the NEAI Meetup Series, Gary King from Harvard presents: Big Data is Not About The Data!.
King’s work is widely read across scholarly fields and beyond academia. He was listed as the most cited political scientist of his cohort; among the group of “political scientists who have made the most important theoretical contributions” to the discipline “from its beginnings in the late-19th century to the present”; and on ISI’s list of the most highly cited researchers across the social sciences. His work on legislative redistricting has been used in most American states by legislators, judges, lawyers, political parties, minority groups, and private citizens, as well as the U.S. Supreme Court. His work on inferring individual behavior from aggregate data has been used in as many states by these groups, and in many other practical contexts. His contribution to methods for achieving cross-cultural comparability in survey research have been used in surveys in over eighty countries by researchers, governments, and private concerns. King led an evaluation of the Mexican universal health insurance program, which includes the largest randomized health policy experiment to date. The statistical methods and software he developed are used extensively in academia, government, consulting, and private industry. He is a founder, and an inventor of the original technology for, Crimson Hexagon, Learning Catalytics, and other firms.
Over at TechCrunch, Anthony Ha writes that Automated Insights’ new product called Site Ai pulls data from existing systems such as Google Analytics and then summarizes that data into normal sentences.
With a Site Ai summary, you shouldn’t have to do too much thinking. As the company name implies, all of the summaries are automatically generated by Automated Insights’ technology, not people. Allen told me that’s a big challenge: “Turning data into text is difficult because it requires marrying two skills that traditionally don’t play well with each other: programming and writing.” The reason Allen said he can do it is because he has a background in both technology (he worked at Cisco and has degrees from MIT in computer science), but also in writing (he’s the author of a number of books published by O’Reilly).
In the book, Siegel says that the big secret about Big Data is that it doesn’t really exist. What is big today will be dwarfed by what is coming.
Everything is connected to everything else—if only indirectly—and this is reflected in data. Data always speaks. It always has a story to tell, and there’s always something to learn from it. Data scientists see this over and over again across predictive analytics projects. Pull some data together and, although you can never be certain what you’ll find, you can be sure you’ll discover valuable connections by decoding the language it speaks and listening.”
In this video from The Next Web Conference Europe 2013, Ken Cukier, Data Editor at the Economist describes how Big Data hype should not deter us from bringing this phenomenon to its full potential to change the world.
Over at Science Magazine, Vijaysree Venkatraman writes that data-driven discovery may soon become the norm in science and that learning to code and becoming comfortable with large datasets may soon be a necessity in many traditional scientific fields.
All science is fast becoming what is called data science,” says Bill Howe of UW’s eScience Institute. Today, there are sensors in gene sequencers, telescopes, forest canopies, roads, bridges, buildings, and point-of-sale terminals. Every ant in a colony can be tagged. The challenge is to extract knowledge from this vast quantity of data and transform it into something of value. Lately, Lazowska says, he has been hearing this refrain from researchers in engineering, the sciences, the social sciences, law, medicine, and even the humanities: “I am drowning in data and need help analyzing and managing it.”
When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.