Dartmouth Events

Simplifying Supervised Text Analysis with Active Learning

QSS Postdoctoral Fellow Blake Miller

12:45 pm – 1:45 pm
Silsby 215
Intended Audience(s): Public

While supervised machine learning methods are increasingly employed for text analysis in the social sciences, researchers often opt for unsupervised text models due in part to the costliness of labeling documents. Unfortunately, unsupervised models are at times less appropriate for the research task at hand than their supervised counterparts. In this talk, I introduce active learning, a method of labeling documents that can dramatically reduce the often prohibitive cost of supervised methods. I discuss the promises and pitfalls of active learning approaches to text analysis in the social sciences using a series of simulation studies. I then introduce a software platform that enables researchers to manage text classification projects while making use of active learning for document sampling. Finally, I discuss some applications of active learning in my own research. Simulation studies demonstrate that active learning can reduce the cost of labeling text data in nearly every scenario, and are particularly useful in classification problems with class imbalance. Simulations also demonstrate that active learning approaches perform more efficiently than random sampling regardless of levels of intercoder reliability.

For more information, contact:
Laura Mitchell

Events are free and open to the public unless otherwise noted.