Machine Learning Identifies Fake Research Papers


Unsupervised machine learning techniques have proven useful in identifying fake research papers submitted to the arXiv preprint server. Approximately 500 preprints are receiving daily by the automated repository arXiv, but are not pre-screened by humans. As a result, many nonsense papers generated by software such as SCIgen and Mathgen have been found in the most popular repository used by scientists to share research results.

Big Workflow – Beyond Intelligent Workflow Management


Big data applications represent a fast-growing category of high-value applications that are increasingly employed by business and technical computing users. However, they have exposed an inconvenient dichotomy in the way resources are utilized in data centers. A new white paper that focuses on these issues is available here on insideBIGDATA.

Data Science 101: Deep Learning Methods and Applications


Microsoft Research, the research arm of the software giant, is a hotbed of data science and machine learning research. Microsoft has the resources to hire the best and brightest researchers from around the globe. A recent publication is available for download (PDF): “Deep Learning: Methods and Applications” by Li Deng and Dong Yu, two prominent researchers in the field.

LA Times Data Desk


As an attempt to remain relevant in an increasingly data-driven world, many traditional news publications are embracing the sweeping changes in their industry by employing a broad swath of new technologies. Here is a good case in point: The Los Angeles Times Data Desk, offering content such as maps, databases, analysis, and visualizations.

Data Science 101: 250 Years of Bayes Theory


It’s been more than 250 years since the appearance of Bayes theorem (named after English statistician, philosopher and Presbyterian minister Thomas Bayes: 1701-1761), one of the two fundamental inferential principles of mathematical statistics.

Gartner Reveals Magic Quadrant for Advanced Analytics

Gartner_Advanced Analytics_MQ_Feb2014

The analytics software category is 30 years old and about $15 billion in size. So why is it that information technology research and advisory firm Gartner has not published a standalone Magic Quadrant on the sector until this year? The answer: Gartner has meshed business intelligence with analytics for the past decade, and viewed them […]

Interview: Data Analytics and the Ubiquitous Internet of Things


We sat down with Cristian Borcea, PhD from the New Jersey Institute of Technology to discuss the IoT and Big Data applications. “New machine learning techniques could help us extract knowledge from these data – this happens especially for knowledge that we don’t expect and we don’t even know exists – we cannot search for something that we don’t know exists.”

Splice Machine for SQL-on-Hadoop


Splice Machine has written a compelling new whitepaper justifying the need for SQL-on-Hadoop technology solutions. The whitepaper, “Splice Machine: SQL-on-Hadoop Evaluation Guide” includes a number of useful topics.

Jaspersoft Big Data Survey


Jaspersoft reccently shared results from its Big Data Survey. Nearly 1,600 Jaspersoft community members responded to the survey on enterprise use of big data in corporate decision-making — 60 percent of respondents were application developers.

Hadoop Buyers Guide

Get your complimentary copy of the Hadoop Buyer’s Guide, from Robert D. Schneider, the author of Hadoop for Dummies.