Modeling language games

Recent advances in natural language processing have led to technologies such as earbuds capable of real-time translation, voice interfaces to home automation, and automatic summarization of news articles. Yet the superficial theoretical assumptions these technologies make about linguistic meaning have limited their application to social sciences research.

This project builds a computational model of talk on Hacker News, an online discourse community focused on computer science and startups, showing how the meanings of words change over time within the community, and how participants' language use changes as they become more central participants. Using Mikholov et al's (2013) word2vec algorithm, I create monthly snapshots of word meanings over a decade of discourse from 300k participants (12m comments). I identify distinct ideological axes--dimensions along which word meanings shift over time--which align with pervasive sexist and racist attitudes in computing culture. The extent to which new members' language use aligns with the community's language use along ideological axes is associated with their future participation in the community. This research offers a new method for studying learning and inclusion in discourse communities, while also pointing to continuing inadequacies of the linguistic models adopted by natural language processing technologies.

Timeline

Proctor, C. (2018). Modeling language games. Manuscript in prepration.

Proctor, C. (2017). Modeling language games. Unpublished manuscript. Joint final paper for CS224w and CS229.