Linguistic justice as a framework for designing, developing, and managing natural language processing tools:
Abstract
As natural language processing tools powered by big data become increasingly ubiquitous, questions of how to design, develop, and manage these tools and their impacts on diverse populations are pressing. We propose utilizing the concept of linguistic justice—the realization of equitable access to social and political life regardless of language—to provide a framework for examining natural language processing tools that learn from and use human language data. To support linguistic justice, we argue that natural language processing tools (along with the datasets that are used to train and evaluate them) must be examined not only from the perspective of a privileged, majority language user, but also from the perspectives of minoritized language users. Considering such perspectives can help to surface areas in which the data used within natural language processing tools may be (often inadvertently) working against linguistic justice by failing to provide access to information, services, or opportunities in users’ language of choice, underperforming for certain linguistic groups, or advancing harmful stereotypes that can lead to negative life outcomes for members of marginalized groups. At the same time, this framework can help to illuminate ways that these shortcomings can be addressed and allow us to use inclusive language data and approaches to leverage natural language processing technologies that advance linguistic justice.