Sign up for our newsletter and get the latest big data news and analysis.

etcML – Free Text-Analysis Tool

Have you every wondered whether a certain TV network has a specific political bias? Is your favorite news source fair and balanced? A group of Stanford computer scientists have created a website with the ability to answer such questions for free using machine learning technology.

The newly launched website is called etcML, short for Easy Text Classification with Machine Learning.

Machine learning is a field of computer science that gives computers the ability to acquire new understanding of data content in a more human-like way. The etcML website is based on machine-learning techniques that were developed to analyze the meaning embodied in text, then perform sentiment analysis – to gauge the text’s overall positive or negative sentiment.

We wanted to make standard machine learning techniques available to people and researchers who may not be able to program,” said Richard Socher, a doctoral candidate in computer science at Stanford and lead developer of etcML. “All users have to do is copy and paste, or drop their text data sets into their browser and click.”

Socher said the new site gives researchers and citizen activists in fields ranging from political science to linguistics an easy way to analyze news articles, social media posts, closed-caption transcripts of television newscasts and other texts of possible interest. The visualization below shows a test of the etcML Twitter sentiment classifier using the keyword “Chris Christy”:

etcML

Here are some ways several beta users of etcML applied machine learning to their own projects:

  • Stanford doctoral candidate Rebecca Weiss, whose studies include political polarization and media coverage. She said the website gives her an easy way to classify words or phrases that embody viewpoints, then sift through millions of news articles and broadcast transcripts looking for patterns. “I can train a classifier and have it label all of my content, and I don’t have to write a single line of code to do it,” Weiss said. “I can then share my classifiers with journalists or other researchers for use in their work.”
  • Rob Voigt, a researcher in computational linguistics at Stanford, has used etcML to evaluate pitches on crowdfunding leader Kickstarter, a website that provides a platform for artists, musicians, filmmakers and others who are seeking financial backing for their projects. Voigt studies what makes a successful pitch. Using etcML, he has found that pitches using plural pronouns – we, us, our – fare better than those written in the first-person singular. Likewise, short films seem to have done better than projects involving comic books, games or fashion.
  • Chinmay Kulkarni, a doctoral student in computer science at Stanford, used etcML to help grade short answer tests for a free, online course with roughly 2,000 students. Testing for the online course presented a challenge: Multiple-choice exams were easiest to grade automatically, but short answers offered a better measure of learning. Yet the instructor couldn’t possibly read and grade 2,000 tests. To solve this problem, students taking the course were required to grade one another. On average, four students ended up grading each exam. This increased the workload on each student. Kulkarni used etcML to help out. The software graded each test. Students still graded one another. But with the software in the grading loop, the average exam only had to be read by three or fewer students. “We were able to get the same accuracy with less effort,” said Kulkarni, who has published a paper about the project.

The etcML development team was advised by Andrew Ng, a professor of computer science, director of the Stanford Artificial Intelligence Laboratory, and founder of the Coursera MOOC platform. Other team members include Stanford students Bryan McCann, Kai Sheng Tai and JiaJi Hu, and French visiting student Romain Paulus.

 

Sign up for the free insideBIGDATA newsletter.

 

 

 

 

 

 

Resource Links: