The documentary “The Cleaners” is about content moderators in the Philippines whose task it is to «clean» social networks like Facebook, YouTube and Twitter of objectionable content. The movie was directed by H. Block and M. Riesewick and won several awards (IMDb 2019). ‘Content Moderation’ as defined by Sarah T. Roberts (2017: 1) is “the organized practice of screening user-generated content (UGC) posted to Internet sites, social media and other online outlets, in order to determine the appropriateness of the content for a given site, locality, or jurisdiction. The process can result in UGC being removed by a moderator, acting as an agent of the platform or site in question”. Moderators have the daily task to check offensive, brutal, dramatic, pornographic and violent content for violations of the platform guidelines – and to delete them if necessary. This is not new: For example, Roberts (2016: 2) already has pointed out, that content moderators are exposed to content that is “personally deleterious or damaging”, including content that may impugn the workers own identities. The documentary accompanies several (former) employees of the Cognizant company in the Philippines and impressively shows the tragic and psychologically extremely traumatizing work of content moderators – who pay the high price for the fact that the vast majority of social media users are not being exposed to content such as terrorist executions, child abuse, suicide and murder (The Guardian 25 Sep 2019; Dwoskin, Whalen & Cabato 25 July 2019).
In view of the enormous burden on content moderators, it seems very obvious to use automated computer programs to assess and delete content that violates the UCGs of social networks. And anyway, due to the enormous amount of data as well as the changeable nature of language, standards and game rules (UCG and legal requirements), the use of machine learning algorithms is indispensable (Binns, Veale, Van Kleek & Shadbolt 2017). But while Policymakers routinely call for social media companies like Facebook to identify and take down hate speech, they wrongly assume that automated technology can accomplish on a large scale the kind of nuanced analysis that humans can do on a small scale. Today’s tools for parsing (analyzing) social media text have limited ability to analyze or detect the intent of the speaker and the complexity of the content moderation process is often overlooked (Duarte, Llanso & Loup 2018, Roberts 2018). Roberts (2016) states, for example, that the (already subjective) assessment of UCG requires not only cultural knowledge (about the platform itself, as well as about the audience assumed) but also linguistic competence in the language of the UGC (which in most cases is not the native language, knowledge of the relevant laws of the place of origin of the platform, and last but not least expertise regarding user guidelines as well as other platform-level specifics concerning what is allowed and what is not allowed. The following section gives an overview on recent approaches, whereas the last section presents existing limitations for the use of automated content moderation.
Recent Approaches
In early text-based internet, mechanisms to enact moderation was often direct and visible for the user – such as demanding a user to alter a contribution from offensive or insulting material, the deletion or removal of posts, complete banning of users or the application of text filters to disallow posting of specific types of words or content. But as the internet as well as social media platforms have grown enormously, the desire for major platforms to control the UGC that they host and disseminate has also grown exponentially (Roberts 2017). While so far the law allowed the platform owners to be immune from some of the liability that was usually associated with a publisher for UGC, in recent years various legal cases both in the USA and around the world have demonstrated that these platform owners may not be immune to all liabilities – and consequently are taking on a more governed approach towards content moderation (Li, Xiong & Tapia 2018). To date, many tools to identify and filter content have been developed, including keyword filters, spam detection tools and hash matching algorithms. These tools are based on the existence of certain pre-established keywords, metadata or patterns. While these tools can be effective at identifying content that contains known keywords, or matches a known hash or metadata pattern, they are not capable of parsing the meaning or context of text. Therefore, research and industry have begun to turn to machine-learning natural language processing (NLP) tools in order to make predictions about the meaning of content (e.g. if text expresses positive or negative opinion). Although NLP tools are able to process and classify text on a much larger scale than humans which would make them attractive to replace current content moderation, their development comes (at least initially) with high costs as most of today’s classifiers are trained using examples of text labelled by humans as either belonging or not belonging to a targeted category of content – e.g. hate speech vs. not hate speech (Duarte et al. 2018). There are recently five limitations of NLP in the context of social media analysis, which now shortly will be presented, before a short outlook is given.
Limitations of Automated Content Analysis
1. The first limitation refers to the fact that NLP tools work best when trained on specific domains and contexts. For example, Abbasi, Hassan & Dhar (2014) found that domain-specific tools can capture sentiment on Twitter more accurately than general tools. Since objectionable content is relatively rare compared to all content, very large random samples must be taken for each possible domain. Taking such large random samples is difficult and expensive (Duarte et al. 2018).
2. Since social bias is also reflected in language, the use of NLP tools carries the risk to further marginalize and disproportionately censor groups that are already discriminated when decisions are based on them. For example, Zhao et al. (2017) found in a study using machine-learning to label images that, while the activity of cooking was about 33% more likely to be associated with females than males in the training corpus, the resulting model associated cooking with females 68% of the time. Using NLP tools for content moderation therefore could lead to content moderation decisions that disproportionately censor groups such as marginalized groups or those with minority views.
3. Often, social media platforms provide “clear, if somewhat general, operational definitions” about objectionable content (Myers West 2018: 4370). According to Duarte et al. (2018) will tools that rely on narrow definitions will miss some of the targeted speech and may be easier to evade. Thus, a tension can be identified between definitions that are as clear as possible and computational flexibility.
4. Duarte et al (2018) have evaluated multiple NLP studies and concluded that under ideal conditions an accuracy of around 70-80% is achieved. While such high levels of accuracy (e.g. using neural networks as done by Clieback et al. 2017) demonstrate an impressive increase within this area, an accuracy rate of 80% also means that one out of every five people is treated wrong in such decision-making. As the authors further mention, “even an accuracy rate of 99% will lead to a high volume of erroneous decisions when applied at scale“ (Duarte et al. 2018: n.pag).
5. Last but not least, today’s NLP tools still fall far short of many policymakers’ expectations when it comes to their ability to parse language as the meaning of language is highly dependent on contextual elements such as tone, speaker, audience and forum. And because NLP filtering tools rely on previously seen features in text (e.g. words, word-relations), they are also easy to evade (Duarte et al. 2018). It has therefore already been suggested to consider information beyond the text (such as demographic information about the speaker). In addition to the risk of amplifying an existing bias (see 2nd limitation), using information about the speaker to adjudicate speech could raise additional human rights and censorship concerns (Duarte et al. 2018).
In conclusion, it can be said that despite clear progress in the application of Natural Language Processing in the area of Content Moderation, there are still major challenges to overcome. However, these seem to have already been recognized and addressed by research and need to be further examined. If in the future it will only be possible to reliably moderate parts of the offensive content using NLP tools, this would considerably reduce the enormous psychological burden on content moderators and as well as the time to remove objectionable content.
(pk)
Bibliography
Abbasi A., Hassan A. & Dhar M. (2014): Benchmarking Twitter Sentiment Analysis Tools. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). (https://www.aclweb.org/anthology/L14-1406/ [State 22.10.2019]).
Binns R., Veale M., Van Kleek M. & Shadbolt N. (2017): Like Trainer, Like Bot? Inheritance of Bias in Algorithmic Content Moderation. In G.L. Ciampaglia, A. Mashadi & T. Yasseri (eds.): Social Informatics (SocInfo 2017), Lecture Notes in Computer Science, 10540, 405-415. doi: 10.1007/978-3-319-67256-4_32
Duarte N., Llanso E. & Loup A. (2018). Mixed Messages? The Limits of Automated Social Media Content Analysis. Proceedings of the 1st Conference on Fairness, Accountability and Transparency (PMLR), 81, 106-106.
Dwoskin E., Whalen J. & Cabato R. (25 July 2019): Content moderators at YouTube, Facebook and Twitter see the worst of the web – and suffer silently. The Washington Post. Retrieved from https://www.washingtonpost.com/technology/2019/07/25/social-media-companies-are-outsourcing-their-dirty-work-philippines-generation-workers-is-paying-price/ (State 20.11.2019).
IMDb International Movie Database (2019): The Cleaners – im Schatten der Netzwelt (2018). Awards. (https://www.imdb.com/title/tt7689936/awards [State 20.11.2019]).
Li C.-S., Xiong G. & Tapia E.M. (2018):
New frontiers in cognitive content curation and moderation. APSIPA
Transactions on Signal and Information Processing, 7(7), 1-11.
doi: 10.1017/ATSIP.2018.9
Myers West S. (2018): Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms. New Media & Society, 20(11), 4366-4383. doi: 10.1177/1461444818773059
Roberts S.T. (2016): Commercial Content Moderation: Digital Laborers› Dirty Work. Media Studies Publications, 12, n.pag. (https://ir.lib.uwo.ca/commpub/12 [State 22.10.2019]).
Roberts S.T. (2017): Content Moderation. In L.A. Schintler & C.L. McNeely (eds.): Encyclopedia of Big Data. doi: 10.1007/978-3-319-32001-4
Roberts S.T. (2018): Digital detritus: ‹Error› and the logic of opacity in social media content moderation. First Monday, 23(3). doi: 10.5210/fm.v23i3.8283
The Guardian (25 Sep 2019): Facebook failing to protect moderators from mental trauma, lawsuit claims. (https://www.theguardian.com/technology/2018/sep/24/facebook-moderators-mental-trauma-lawsuit [State 20.11.2019]).
Zhao J., Wang T., Yatskar M., Ordonez V. & Chang K.-W. (2017): Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). (https://arxiv.org/pdf/1707.09457 [State 22.10.2019]).