Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Social media and microtargeting: Political data processing and the consequences for Germany:

Social media and microtargeting: Political data processing and the consequences for Germany: Amongst other methods, political campaigns employ microtargeting, a specific technique used to address the individual voter. In the US, microtargeting relies on a broad set of collected data about the individual. However, due to the unavailability of comparable data in Germany, the practice of microtargeting is far more challenging. Citizens in Germany widely treat social media platforms as a means for political debate. The digital traces they leave through their interactions provide a rich information pool, which can create the necessary conditions for political microtargeting following appropriate algorithmic processing. More specifically, data mining techniques enable information gathering about a people’s general opinion, party preferences and other non-political characteristics. Through the application of data-intensive algorithms, it is possible to cluster users in respect of common attributes, and through profiling identify whom and how to influence. Applying machine learning algorithms, this paper explores the possibility to identify micro groups of users, which can potentially be targeted with special campaign messages, and how this approach can be expanded to large parts of the electorate. Lastly, based on these technical capabilities, we discuss the ethical and political implications for the German political system. Keywords Microtargeting, social media, Germany, influence, datafication, electorate political actors have started using newly developed Introduction tools in order to analyse citizens’ behaviour and to The contemporary digital revolution is constantly influence the electoral body. One of these methods is transforming the political world. Datafication (Mayer- microtargeting, which allows the formulation of perso- Scho¨ nberger and Cukier, 2013), i.e. the categorization, nalized messages and their direct delivery to groups and quantification and aggregation of phenomena into individuals (Agan, 2007), hence creating a promising databases, and their further algorithmic processing, tool for electoral campaigning and opinion formation. have opened new opportunities in understanding and In this paper, we demonstrate a proof of concept evaluating complex social phenomena. More specific- regarding the ways political actors could establish the ally the use of social media and the internet has resulted conditions for political microtargeting in Germany, in the creation of enormous databases that contain through the utilization of social media platforms. information about citizens’ personal and political pref- erences. Based on these Big Political Data a new type of Bavarian School of Public Policy, Technical University Munich, Germany data-driven interaction between politics and citizens emerges through social media. In its core lies the appli- Corresponding author: cation of advanced statistical and machine learning Orestis Papakyriakopoulos, Bavarian School of Public Policy, Technical algorithms, the possibilities of which enable the devel- University Munich, Richard-Wagner-Strasse 1, 80333 Munich, Germany. opment of new political strategies. Consequently, Email: orestis.papakyriakopulos@tum.de Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http:// www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Big Data & Society The scope of our analysis is to identify the possibilities has the potential to partly track the predispositions or and dangers of microtargeting in electoral campaign- general interests of a voter (Ellul, 1966), and based on ing, taking into consideration ‘state of the art’ technol- them, to modify the candidates’ public images in a way ogy. Therefore, we apply our method to Facebook data that complies with the voters’ opinions (Bond and that could actually be used in political campaigning. Messing, 2015; Capara et al., 1999). Furthermore, by Initially, we explain the theory behind microtargeting directly communicating individual- or group-specific and discuss existing obstacles that prevent its applica- messages, candidates are able to reduce the risk of alie- tion. Second, we illustrate our methodology and pre- nating other voters that might disagree on a topic sent our results. Lastly, we evaluate data-driven (Woo, 2015). Another advantage is that microtargeting microtargeting ethically and comment on its political allows political actors to target voters from the entire consequences. political spectrum, rather than exclusively developing their campaign on the characteristics of the median voter (Downs, 1957), as was the case in the past. Microtargeting in theory Finally, given that opinion polls in the 2016 US and Microtargeting is a strategic process intended to influ- 2017 German elections failed to make plausible fore- ence voters through the direct transmission of stimuli, casts of election results, microtargeting provides a which are formed based on the preferences and charac- methodology to overcome political decisions based teristics of an individual. First of all, microtargeting solely on survey polls. Despite the above advantages, presupposes the collection of large amounts of data it is important to note that there is no comprehensive able to depict the political preferences and other non- study that proves the effectiveness of microtargeting political characteristics of voters. This data can be (Jungherr, 2017; Karpf, 2016); to date it remains a either manually collected or acquired through data- promise emerging from the technological state of the mining and can include information ranging from a art. person’s name, address, and voting history to more One of the main reasons behind the success of micro- abstract properties such as a person’s opinion about targeting in the US is the loose legal framework, which allows political actors to almost freely create, acquire political and non-political topics, their social activity and cultural background. The gathered data are then and use databases that contain personal information. It processed with the aid of appropriate machine learning is characteristic that there is no dedicated data protec- algorithms, while the acquired results depend on the tion law or a concept of ’sensitive’ personal data in the type of algorithm used. It is then possible to make pre- US legislation. Hence, there is no general legislative dictions about specific variables, for example, the out- framework exclusively dealing with the protection of come of a political decision (supervised learning) or a person’s privacy rights (Sotto and Simpson, 2015). identification of patterns in the data through clustering Although legal frameworks, as the FTC, ECPA, (unsupervised learning). Implementing the latter, polit- HIPAA, etc., indeed aim to regulate the monitoring ical actors are in a position to detect sub-groups of of personal data and their protection in their respective voters that share common demographic and attitudinal fields, the administration of data policies takes place traits (Barbu, 2014). Based on the algorithmic results, usually only indirectly, by laws that might impose pur- they can then generate messages or plan actions aimed pose limitations or time limits on the data retention at influencing each specific sub-group or person (often (Boehm, 2015). Furthermore, the US law presents sig- called nanotargeting (Edsall, 2012)), leading to their nificant gaps concerning the protection of individual potential mobilization or de-mobilization. privacy (Ohm, 2014): e.g. the datafication or reuse of Microtargeting was first applied to a limited extent information acquired as a by-product of providing ser- in the US 2000 Federal Elections by the Republican vices is largely unregulated (Strandburg, 2014: 22). Party (Panagopoulos, 2015). Since then, the increasing Consequently, such legal inconsistencies facilitate the datafication of societies has provided fertile ground for development of huge political databases, which can its expansion as a political strategy. A milestone for its then be used for political campaigning (Bennett, 2016). application was the 2008 Federal Elections (Franz and Contrary to the US, the legal framework applicable Ridout, 2010), when the Democratic Party campaign in Germany significantly limits the potential of micro- applied the strategy at full scale. Today, microtargeting targeting. Germany’s privacy law complies with the is a standard online and offline (Panagopoulos, 2015) EU-directive on the processing of personal data. The campaigning method in the US as it overcomes prob- General Data Protection Regulation (EU-Directive, lems of classical political campaigning. First of all, it 2016) provides an extensive regulatory framework for Papakyriakopoulos et al. 3 the protection of privacy and personal data, their social media platforms cannot be assumed to be the acquisition, use and exchange. The GDPR thoroughly same for the actual electorate. The politically active describes the limits and responsibilities of data control- user population on Facebook is in no way representa- lers and processors, supports the subjects’ rights to tive of the whole population of a country (Ruths and privacy and consent, and stipulates the exact regulating Pfeffer, 2014), while the expression of an opinion online role of public authorities. Furthermore, the German does not fully correspond to a coherent political state- data protection law explicitly defines the conditions ment (a like is not a vote; Hegelich and Shahrezaye, and cases in which someone is able to access and use 2015). Furthermore, the evaluation of social media personal data (Da¨ ubler et al., 2016) and lays down the data is bound with multiple methodological issues rights of persons affected (Broy, 2017), strongly limiting (Hegelich, 2017). Still, the case of the United States data exploitation. has shown that political campaigning is more than ever based on data, from which an electorate’s image is derived, also known as perceived voter model (Hersh, Barrier 1: Privacy and data protection 2015). This model may be misleading but nevertheless policy used, as it reduces the complexity in campaign decision- Some authors have argued therefore that microtarget- making. Due to the fact that it is almost impossible to ing cannot be applied in German politics. However, causally link a campaigning tool to election results, despite the legal restrictions, there is ample leeway for microtargeting is used as long as it is assumed to have it on social media platforms (Papakyriakopoulos et al., a successful influence – even if in reality it might not. 2017). The reason is that the German privacy law per- The difficulties in causal inference arise – amongst mits the collection and processing of public personal others – from potential self-fulfilling prophecies: data stemming from social media, as long as the indi- should a campaigning tool identify a target group, the viduals’ interests are not challenged (Dorschel, 2015). campaign will increase interaction with this group. This The GDPR clearly states that given the appropriate special attention might yield positive results; but these safeguards, personal data on political opinion can be results could have also been the same for a totally dif- used for electoral activities (EU-Directive, 2016: 11). In ferent group, as well. Despite the above, microtargeting addition, users on social media services consent to com- is applied, even if it might be epistemologically impos- panies using their personal data for commercial and sible to evaluate its exact impact. other activities, by opting in. Hence, the legal require- ments for using social media data as basis for political Data and method microtargeting are met. Given the fact that users agree to publish on social media a huge amount of data about In this paper, we demonstrate how politicians in their political and non-political preferences and behav- Germany can create the conditions for microtargeting iour, these platforms are an ideal source for political based on data from the social media platform Facebook knowledge extraction. Social media have become a key and we evaluate its ethical and political consequences. environment for political campaigns, as the majority of Facebook was chosen as a data source for three rea- politicians can use them to communicate directly with sons: (1) the German Facebook population is larger the electoral body (Barbera´ and Zeitzoff, 2017; and less selective than that of Twitter. (2) It is part of Hegelich and Shahrezaye, 2015; Medina Serrano the company’s business model to offer targeted adver- et al., 2019; Nulty et al., 2016; Stier et al., 2017). That tisement services for political campaigning, the possibi- aside, political actors often perform organized influen- lities of which we are exploring. (3) Contrary to the US, cing strategies on social media, frequently trespassing where there are extensive political databases with per- the legal limits set (Weedon et al., 2017). sonal identifiers (Bennett, 2016), in Germany this is not the case. Hence, social media provide a straightforward way to acquire knowledge for microtargeting. Barrier 2: Data bias For our proof of concept, we analysed the public The legal framework is not the only obstacle for suc- Facebook pages of the German political parties and cessful microtargeting. The type of data subjected to their supporters: Our sample includes the following algorithmic process and their entailed results can some- parties: Christlich Demokratische Union (CDU), times lead to spurious political action. In our case, the Christlich Soziale Union (CSU), Sozialdemokratische world of social media is not identical to the offline Partei Deutschlands (SPD), Bundnis 89/ Die Grunen, ¨ ¨ world. Hence, political preferences appearing on Die Linke, and Alternative fur Deutschland (AfD). ¨ 4 Big Data & Society CDU is the main conservative party of Germany, while case it provides a plausible classification method for the CSU is the conservative party active in Bavaria. SPD users. Furthermore, it does not distort the microtarget- represents the main German social-democratic party, ing process, as microtargeting targets the identification and Die Linke the radical left. AfD has a nationalist, of voter’s predispositions and not to definitely certify anti-immigrant and neo-liberal agenda, while FDP is a someone’s exclusive support to a party. As shown in conservative, neo-liberal party. Finally, Bundnis 90/Die Figure 1, around 50% of the active users per party have Grunen is the German green party. made only one Like. This is typical of Big Data appli- For each political page, we evaluated user ‘‘Likes’’ cations on social media phenomena, where the infor- on political posts and assigned a partisanship to each mation for the majority of users is low. user according to their preferences (Figure 1). Along with the identification of potential cross-pres- Following a standard microtargeting technique, we sured partisans, we wanted to identify the specific con- focused our study on users who have liked content tent that they find interesting. Therefore, we applied the on pages of more than one political party. The LDA topic modelling algorithm (Blei et al., 2003) to reason behind this decision is that the specific group classify 251.947 posts. LDA has many advantages of voters, also named as cross-pressured partisans, has over other standard text-mining algorithms (Grimmer the highest likelihood to be influenced, as they are and Stewart, 2013), as it can recognize complex rela- both undecided and engaged in politics (Ellul, 1966; tions in text-datasets. The algorithm has the ability to Hersh, 2015). After identifying the relevant groups, we cluster posts in a certain number of topics, where each applied machine learning algorithms to cluster the topic is a set of words that characterize different con- various pages’ posts and created a mapping of 55 dif- tents. Hence, someone can evaluate all the posts with- ferent topics, to which each of the posts might be out having to investigate them one by one. LDA assigns assigned. To achieve this, we performed topic model- a probability for each post belonging to a specific topic. ling analysis by applying a Latent Dirichlet Allocation Then, by ascribing to each post the topic with the high- algorithm (Blei et al., 2003). In this way, we demon- est probability and by detecting the users who liked it, strate how someone can detect individual political we can explicitly track the topics that each user is inter- topics of interest and how these can be later used to ested in. shape targeted messages for each micro group of The LDA algorithm is a three-level hierarchical users. Bayesian model that predicts the probabilities of Prerequisite for the application of microtargeting is words and documents belonging to a number of the existence of a rich database containing voters’ char- topics K given the empirical distribution of words (or acteristics and preferences. Therefore, we mined data n-grams) in a corpus (Blei et al., 2003, 2002). In our from 570 public pages related to the major political case, the corpus consists of the total number of posts M parties in Germany through the Facebook Graph under investigation, while each post corresponds to a API, and analysed posts and Likes. We selected the document d, which is a sequence of N words. LDA is a pages by searching the respective party names in the generative model, i.e. it assumes the probability distri- name field of the Facebook pages. We then classified butions of topics over words  , of documents over manually our results, and removed irrelevant pages. topics  and predicts the probability that a specific We mined every post generated by the administrators word in a specific document will belong to a specific of the pages since their creation, the Likes each post topic. This Bayesian admixture can be described by got, and the unique IDs and profile names of the users the following probability distributions liking them. Usually, the profile name of a Facebook Dir ðaÞ account tends to be the same with the real name of the d K account holder, as Facebook maintains a real name Dir ðÞ policy. In total, we collected 251,947 posts with k V 6,347,448 Likes related to them and identified the activ- z  Multinom ð Þ w K d ity of 1,208,740 unique users. This is only data related to the pages mined, hence the actual size of trackable w j z  Multinom ð Þ w V k users is even larger. We define a user who has liked at least one post of a party as partisan, and a user who has liked posts on pages of two or more parties as a cross- where V is the number of unique words existing in the pressured partisan. Of course, the act of liking per se corpus, and  and  are Dirichlet parameters. does not make someone a party partisan, but in this Multinomial distribution z gives the probability that w Papakyriakopoulos et al. 5 Figure 1. Likes distribution for the users on parties’ pages. a topic will be assigned to a word, given the distribution The specific algorithm comes with the advantage of of topics over documents. Finally, multinomial distri- integrating out the probability distributions  , k d bution w j z gives the probability that the model will (Darling, 2011). Thus as part of the iterative Markov generate a specific word in a specific document given a chain, one can calculate the targeted probabilities topic (Figure 2). through the process In our case, we want to create topics about the con- tent of our corpus based on the empirical distribution for each document : d ¼ð1 ... MÞ of words over documents. Given the complexity of the for each term in a document i ¼ð1 .. . N Þ model and the fact that the initial distributions are w¼i i v þ  n þ i,j i,j assumed and not empirically provided, we randomly Pðz ¼ j jz , VÞ¼ P P i i V K d assign topics to words and documents and we follow v þ V n þ K i,j w¼1 k¼1 i a Markov chain Monte Carlo procedure to update their values (Griffiths, 2002). By iteratively applying a where i is the concrete appearance of a word, i w¼i Markov chain, we can converge to the assumed distri- denotes its exclusion and j is a topic. v corresponds i,j butions and hence sample from them (Gilks et al., 1995; to the number of times word i is assigned to topic j, Roberts and Smith, 1994) the probability Pðz jwÞ that a without its current appearance and index v w i,j w¼1 word in a document belongs to a specific topic. More gives the total number of words in the corpus assigned specifically, we used a collapsed Gibbs sampling to topic j excluding i. Furthermore, n contains the i,j Marcov chain Monte Carlo (MCMC) (Geman and total number of words in document d that are assigned Geman, 1984) method to identify the relevant topics. to topic j without i. Finally, n corresponds to the i 6 Big Data & Society Figure 2. Plate notation for the Latent Dirichlet Allocation algorithm. Figure 3. Topic optimization process. The model with the highest Jensen–Shannon convergence contained 55 topics. total number of words in the document, again not including i. Necessary for the creation of a useful LDA model is divergences for the unique 1485 topic combinations and the election of an appropriate number of topics, in created a distance matrix. On it, we applied a principle order to split the content into interpretable sub- component analysis algorithm (Hotelling, 1933) and we groups. Electing a small number of topics results in a plotted the first three components. clustering of posts, from which one cannot identify con- crete political topics of interest. On the contrary, if the Results number of topics is too large, the algorithm selects many words as topic-important that actually have no The first result of our analysis was the specification of political value. To overcome this issue, we applied a the political content of the investigated posts. The LDA topic optimization algorithm proposed by Deveaud algorithm clustered the posts in 55 topics that can be et al. (2014). More specifically, we calculated the split into three main categories. These categories were Jensen–Shannon divergence between topics for multiple chosen manually, and do not denote that they are the LDA models through the equation optimal ones; still their election makes the results much more interpretable. The first category includes topics V   V X X related to general political issues, such as social involve- 1  1 i,w j,w Dðk , k Þ¼  log þ  log i j i,w j,w ment (topic 1), education (topics 2, 15), national econ- 2  2 j,w i,w w¼1 w¼1 omy (topic 4) and homeland security (topic 32). Some topics do not only illustrate the relevance of posts to a where i, j are two different topics in a model and political issue, but also the exact opinion underlying ,  the probability density values of the distribu- them. For example, topics 10 and 12 are both migration i,w j,w tion  for a word w in the corpus V and each topic, related, but topic 10 includes posts that are refugee- respectively, then selected the model that maximizes the friendly, while topic 12 contains posts that demand a sum of the Jensen–Shannon divergence for all topic stricter migration policy. In addition, there are topics combinations given the expression that analyse political parties (topic 39) or persons (topic 38). In the same category, also exists a set of topics (9, 27, 14) that contain posts that do not make concrete K ¼ argmax Dðk , k Þ opt i j political statements, but declare uncertainty and reflec- KðK  1Þ k , k ¼1 i j tion. The second category includes topics that are related to political actors and candidates, but not as Based on the optimization process (Figure 3), we part of a political discussion. They summarize posts concluded on an LDA model with 55 topics. In order about political events, media appearances and electoral to sort and visualize topics according to their similarity, campaigning. Finally, the third category contains topics we used the method proposed by Sievert and Shirley that are location related and discuss political problems (2014). We used the already calculated Jensen–Shannon about regions. For example, topic 54 includes Papakyriakopoulos et al. 7 Figure 5. Average Likes frequency for the mean and the cross- pressured user. Figure 4. Topic distance visualization with the help of PCA. Circled are topics 21, 43, 38. potential influence will contribute to the motivation of other users as well. posts about Berlin, topic 31 about Hamburg and topic Figure 6 shows the ratio of cross-pressured partisans 43 about Bavaria. between parties. In the given dataset, more or less 10% In order to evaluate and verify our topic classifica- of the page users for each party are cross-pressured. tion, we visualized the relationship between the devel- This does not mean though, that this number corres- oped topics in a three-dimensional space with the help ponds to the actual electorate, as the descriptive results of PCA (Figure 4). Each sphere corresponds to a dif- are biased through our statistical sample and the struc- ferent topic, while their size is proportional to the ture of the social media platform. Nevertheless, it is number of posts they contain. Their distance in 3D- possible to recognize certain predispositions of the space functions as a measure of their content similarity. electorate, as for example an increased interaction of It is visible that three categories classify topics into Union and FDP users and the almost non-existent unique clusters. As expected though, there is some over- overlap of users that are interested in both Die lapping between categories, as a topic might contain Grunen and AfD. keywords belonging to more than one categories. For After the concretization of the topics of interest, example topics 21, 43, 38 appear very close, even microtargeting can be performed in two ways: one though we classified them differently (Table 1). This can either initially focus on single users and then occurs because they all include a combination of track afterwards the topics they are interested in, or posts of all classes. Topic 21 is about AfD, including select specific topics and then identify users interested both posts about its political background and the elec- in them. To demonstrate how further steps of the tions. Topic 38 is about Angela Merkel and her polit- microtarging process could be realised, we choose ran- ical activity, as well as her party structure. Finally, domly topic 4 as an example. Topic 4 includes, amongst topic 43 is about Bavaria, including a number of others, the words: Euro, Steuergeld, Milliarde, posts about the regional CSU party and its candidates. Zuschuss, Kosten, i.e. it is linked to German economic In our analysis, we identified a total of 58,532 cross- policy. It is possible to analyse the relevance of this pressured users. Figure 5 shows that cross-pressured topic for each party, as well as to identify users who users tend to like more frequently than the average like the topic. In this case, we find Union coalition Facebook partisan. This however does not mean that posts that talk about the German economy and identify cross-pressured partisans tend to be more active; on the the relevant cross-pressured partisans. Then, we ran- contrary, it denotes that we can only trace cross-pres- domly pick one of the users to investigate all the sured partisans, when the users are more active online. other topics that are of interest to her. Our random This has an important implication for the perceived cross-pressured user has also liked FDP posts, and as voter’s model: The selection of cross-pressured parti- Table 2 shows, she has also expressed interest in polit- sans as targeted population comes with the advantage ical issues of Schleswig Holstein and homeland security. that they behave as multiplicators, and thus their Hence, we can identify significant political topics of 8 Big Data & Society Figure 6. Percentage of cross-pressured partisans per party. Papakyriakopoulos et al. 9 Table 1. Extended keywords for topics 21, 38, 43. Topics Extended keywords 21 AfD, Partei, rechtpopulist, Position, Altparteien, Wahlen, Argument, Stimmen, vertreten, Gegner 38 Merkel, Angela, Kanzlerin, CDU, Union, CSU, Seehofer,Volk, Fluchtlingspolitik, Terroranschlag 43 Bayern, Mu¨nchen, Freistaat, Wahlprogramm, CSU, muss, Regierung, Generalsekretarin, Schalzwedel, Gru¨ne Table 2. Topics of interest for an example-user and for Union-SPD cross-pressured users. Target Topic keywords Example user 4 Euro Steuergerl Milliar Zuschuss kosten 32 Innenminister Polizei ermittelt Justitz Kriminalita¨t 51 Schleswig Holstein Kiel Rostock Schwerin Union-SPD cross-pressured users 8 Islam Muslim Christlich Religion Kirche interest for the user, as well as political parties to which, data, data existing on social media platforms provide the user is positively inclined. a fruitful source for microtargeting. By mining and The topic modelling algorithm, however, does not structuring the content of 570 German political pages, illustrate if the user thinks positively or negatively we managed to detect over 58,000 cross-pressured users of a political topic, i.e. it does not trace their exact through their Likes. The selection of this sub-popula- political attitudes to the issues. To do this it would tion was based on the idea that they are people both be necessary to apply a sentiment analysis algorithm active in politics and potentially undecided on their to the parties’ posts, or a qualitative analysis thereof. exact party preference. Hence, communicating a mes- In the current research, we did not perform a senti- sage to them is of greater value than to people who are ment analysis. Given the results of the sentiment ana- strict supporters of one party or are not interested in lysis, the person’s political evaluations of political politics at all. In order to track topics of interest of topics and party sympathy, a campaign-maker has cross-pressured users, we applied simple machine learn- adequate information to create personalized messages ing algorithms on the pages’ content and found the and communicate them through micro-targeted most common issues discussed. Finally, we connected advertisement. the topics with the users through their posts’ Likes, Similarly, it is possible to identify topics that are finding out valuable political information about them. important for groups of strategic importance. For Accompanied with a sentiment analysis algorithm, the example, partisans that are cross-pressured by the necessary knowledge can be gathered for the creation of Union and SPD are highly interested in topic 8, personalized messages. Last step is to contact the users, which is related to Islam and Christianism. Thus, a process that should be adapted to and compliant with after the combination with a sentiment analysis, the the legal frameworks. creation of an advertisement specifically related to The communication of the message could theoretically this topic can provide additional advantage to a polit- be performed in two ways: One could cluster users shar- ical party, as it might mobilize an important part of the ing common characteristics and directly target them electorate towards its ends. Of course, the content of a through the platform’s advertisement service, which personal message can be further specialised, as it is allows campaigners to define custom target audiences. always possible to access recursively the full post that This comes with the advantage that there is no need for a user liked, and locate exactly its content in relation to manual matching of users to their real world identities, as the topic it belongs to. it suffices to communicate the message to them through Given the mined Facebook data, we proved that the platform. The second way is to manually look at a there is an extensive dataset for potential microtarget- person’s further public activity on Facebook, and given ing in German politics available in social media ser- additional sociodemographic data available, try to find vices. Although national privacy regulations usually another communication path (e.g. email, mail, phone forbid the direct acquirement and use of personal number, etc.). Although the second way is time- 10 Big Data & Society consuming, complicated, and sometimes inadequate, which in this case were not taken into consideration, gathering socio-demographic data about individuals and but are still publicly available online (Kosinski et al., then targeting them offline is actually what is intensively 2013). By collecting data from other social media inter- done in US campaigns (Hersh, 2015: 77). Still, in EU the actions, e.g. likes on news media or other non-political feasibility of the strategy is much lower, due to the exist- pages, one can train models and assign probabilities of ing privacy laws. For the second way to be applicable, someone being interested in a political issue or party. In political actors should develop platforms, applications, or this way, political knowledge can be extracted about services, through which they would get the person’s con- users that actually did not actually interact with any sent to target them with the related messages. party-related content on the platform and hence be The processing of the social media political dataset included as audience of political microtargeting. also comes with specific limitations. The inferences drawn reveal only part of a person’s political charac- Discussion teristics, and only if indeed someone’s online behaviour matches their actual political preferences. Furthermore, The penetration of datafication into people’s privacy is the users detected online might not have a voting right once more proven through our investigation, as we in Germany, making the sampling process biased and were able to gather and process a large amount of distorting the advertisement process. user data from the social media platform Facebook. The presented results serve as a proof of concept. We Hence, from our perspective, it is important to evaluate have thoroughly described how microtargeting based on the impact of the latest technological advances on the social media data could be performed. The analysis was ethical and political life of our society. The discussion focused on Germany, where the acquisition of relevant that has already started regarding the application of data is usually problematic. The described method can be data-intensive algorithms to social networks (e.g. extended through further actions in both online and off- social bots (Thieltges et al., 2016), using algorithms line campaigning. For example, parties have already for social engineering (Strohmaier and Wagner, started promoting apps to connect the digital and ana- 2014)), must now be also extended to the effect of logue campaigning. These apps help to analyse the reac- microtargeting as a technology driven campaigning tions of people, giving feedback to the campaign- method. As the new technological capabilities raise managers about their campaigning tactics. Furthermore, questions regarding the limits of ethical political influ- the combination of the app data with data coming from ence and the potential transformation of political social media can provide even more insights on the rele- behaviour in contemporary society, our task is to iden- vant issues. The processed social media data can also be tify and reflect on the newly emerged issues. used to complement standard opinion prediction tech- The study showed, that through machine learning, it niques. Existing census data about demographic charac- is possible to track someone’s interests and subse- teristics and public record data about past voting quently develop personalized political advertisement behaviour can be combined with results from the topic that can be used to influence social media users. modelling and sentiment analysis algorithms and hence Hence, the first question emerging is whether microtar- explain the features of political behaviour. geting might lead to the manipulation of voters. The In our study, we focused only on the detection of transmission of a personalized message does not per se voters’ political topics of interest, however part of the signify the manipulation of a person, as each individual microtargeting process is also the evaluation of the per- possesses the freedom to decide whom to vote for. As sonalized advertisement’s success. This can be done the public is offering more and more voluntarily their after the first application of microtargeting, through information in exchange for online or offline services analysis of click-statistics, performance of surveys and (Barbu, 2014) though, algorithms tend to become the actual election results. Furthermore, after the cali- more precise in evaluating personal preferences and bration of the process, the generation of microtargeting attitudes. As microtargeting could potentially contact data can be highly automated. This of course raises the the person directly with a very well adapted message, it question of whether politicians’ positions would still be might achieve what is called instant influence: trigger the person’s mind to develop a conditioned response a result of their actual opinions or just an algorithmic creation for attracting voters. Finally, machine learning the way the political actors desire (Cialdini, 2007). algorithms can predict the users’ interest in further This happens, because in cases of fast incoming infor- topics or parties, even if they have not liked them on mation stimuli, the individual does not process them the platform. Further data would be required for this, rationally (Simon, 1996). On the contrary, the Papakyriakopoulos et al. 11 information is assimilated intuitively, creating a phe- 2015), and realizing how datafication has pragmatically nomenalist connection between the message and the altered the contemporary social structure. political party (Piaget, 1947). Of course, framing a Important for the ethical evaluation of microtarget- party successfully also presupposes other psychological, ing, as well as for data privacy, is also who acquired the social and political preconditions to be present (Domke related data, not only how. For us being able to gain et al.. 1998; Schmitt-Beck, 2003), which cannot be access to the aforementioned dataset poses a dilemma: formed by simply sending well-adapted personal mes- Should public data, for which users have provided their sages. But given these conditions, a systematic applica- consent to be used and further processed, become openly tion of microtargeting might lead to a ‘progression from available, or should they remain only under the control thought to action artificially’ (Ellul, 1966). A reaction to of the initial gatherer? The question is relevant more this issue is the conscious understanding of the person than ever to the present discussion, given the contempor- that they are being microtargeted. In this way, they ary Facebook data scandal (Facebook, 2018a, 2018b), as would be in position to evaluate a message totally dif- well as the platform’s decision to significantly limit the ferently, knowing that the incoming stimuli are already data available through its application programming adapted to their own attitudes. The rule of the conscious interface (API). On the one hand, making data broadly over the unconscious is a precondition for the society to open might result to an uncontrolled data mining phe- remain autonomous (Castoriadis, 1997). nomenon (Pasquale, 2015), with private data becoming a This type of consciousness is not only needed at the part of the public sphere. On the other hand, the posses- moment of evaluating a political message, but must sion of these public data only by the original gatherer also exist at the level of privacy. It is common that might result in the problem of a knowledge monopoly, through the use of apps and online platforms, people making the data holder much more powerful in eco- voluntarily provide their personal data and allow their nomic and political terms than other social actors. further usage as a by-product of the service. It is The specific case study would have a different form, important for users to become aware of what they are if the data were collected under the new API rules of the agreeing on, and what consequences their actions have. platform. Important public data for microtargeting, as user likes, cannot be downloaded in an automated way. In this direction, certain normative and legal impera- tives have already been formulated: Transparency of If public online data are accessible only to the extent data collection, processing and application (Barocas platforms decide, and political actors can target users et al., 2017), autonomy of the subject on having control exclusively through the targeting services provided, of their own personal data (McDermott, 2017), and then the political system itself becomes contingent to (in)visibility: the right of the subject to choose if and technological companies. Electing microtargeting as a to know how personal data might be collected and used political campaigning strategy thus presupposes the (Taylor, 2017), are stated as necessary for supporting constant compliance of political actors with the existing someone’s privacy. The EU General Data Protection political and legal conditions (Kruschinski and Haller, Regulation makes also steps towards this direction, 2017), as well as with the market structures and the by explicitly incorporating transparency and consent dominant online platform decisions. in its regulatory claims. Another issue regarding microtargeting is related to Despite the regulatory efforts, the act of a user opting the perceived voter model. Given that the majority of in, given a very long document of terms and conditions, users in social networks are relatively inactive, the where how personal data might be used is outlined in a danger exists that politicians will concentrate on the short and general manner does not signify transparency, analysis of data provided by the more active users, or actual consent (Strandburg, 2014). Especially regard- even if that sample is not representative of the popula- ing personal data for microtargeting, the information tion (Barbera´ and Rivero, 2015). The less data one can that should be presented to the subject in order to give gather about a person, the more inexact can their atti- their consent should clarify exactly what information is tude-prediction be. Thus, a campaign might be devel- going to be collected, how, by whom and for what pur- oped based on falsely assessed voters’ attitudes. If pose. This is a prerequisite for the subjects’ expectations political campaigns are highly or exclusively data- about the collected data to coincide with the actual data driven, it leads to the perceived voter phenomenon usage (Barocas and Nissenbaum, 2014). At the same (Hersh, 2015): All campaigning decisions are based to time, the individuals should be emancipated, by both an algorithmically calculated electorate and thus, any getting to know through access to the history of their forecasts are dependent on the nature of the collected personal data used by services (Kennedy and Moss, data. Given that social media data always possess a 12 Big Data & Society 3. https://www.facebook.com/help/112146705538576 certain rate of bias (Ruths and Pfeffer, 2014), it is pos- (accessed 21 March 2018). sible that political actors might perform a campaigning 4. Appendix 1 contains the full description of the topics cre- on a ‘constructed’ reality and not on an actual one. Of ated, as well as their important keywords. course, gathering of even more data is not a solution. 5. The topics contain keywords as e.g. Vielleicht, aber, If someone observes campaigning in the US, they might glaube, nachdenken. question the independency of the electorate: US parties’ 6. E.g. CDU’s app ‘connect17’. campaigns aim for the mobilization or de-mobilization of specific social groups, demographic layers and geo- graphic populations in order to strategically achieve References their goals (Hersh, 2015; Kreiss, 2016; Persily, 2017). Agan T (2007) Silent marketing: Micro-targeting. Available Furthermore huge public databases contain extensive at: http://gaia.adage.com/images/random/microtar- data about the majority of the electorate and their get031207.pdf (accessed 27 March 2018). voting history. The discussion about microtargeting Barbera´ P and Rivero G (2015) Understanding the political and data privacy is already under way in Europe and representativeness of twitter users. Social Science the newly emerged issues should be assessed. Computer Review 33(6): 712–729. This study demonstrates through its ‘proof of con- Barbera´ P and Zeitzoff T (2017) The new public address cept’ certain possibilities and dangers of microtarget- system: Why do world leaders adopt social media? ing, in order to initiate an important debate for the International Studies Quarterly 62(1): 121–130. political system. To expand this discussion, further Barbu O (2014) Advertising, microtargeting and social media. qualitative and quantitative research is needed, in Procedia – Social and Behavioral Sciences 163: 44–49. order to uncover: (1) How political communication Barocas S, Bradley E, Honavar V, et al. (2017) Big data, data science, and civil rights. ArXiv. Available at: https://arxiv. on social media influences the formation of political org/abs/1706.03102 (accessed 5 November 2018). attitudes in terms of polarization, political mobilization Barocas S and Nissenbaum H (2014) Big data’s end run and opinion formation? (2) What is the effect of polit- around anonymity and consent. Privacy, Big Data, and ical campaigning services offered by social media and the Public Good: Frameworks for Engagement 1: 44–75. other internet platforms? (3) At which level current Bennett CJ (2016) Voter databases, micro-targeting, and data privacy policies protect individuals and what else protection law: Can political parties campaign in Europe could be done? The answers to the aforementioned as they do in North America? International Data Privacy questions, if given, can redefine how the political dis- Law 6(4): 261–275. course should be performed in the digital age. Blei DM, Ng AY and Jordan MI (2002) Latent Dirichlet allocation. In: Dietterich TG, Becker S and Ghahramani Declaration of conflicting interests Z (eds) Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, pp. 601–608. The authors declared no potential conflicts of interest with Blei DM, Ng AY and Jordan MI (2003) Latent Dirichlet allo- respect to the research, authorship, and/or publication of this cation. Journal of Machine Learning Research 3: 993–1022. article. Boehm F (2015) A comparison between us and EU data pro- tection legislation for law enforcement purposes. Available Funding at: www.europarl.europa.eu/RegData/etudes/2015/536459/ The authors disclosed receipt of the following financial sup- IPOL_ STU%282015%29536459_EN.pdf (accessed 28 port for the research, authorship, and/or publication of this March 2018). article: the German Research Foundation (DFG) and the Bond R and Messing S (2015) Quantifying social media’s Technical University of Munich within the funding pro- political space: Estimating ideology from publicly revealed gramme Open Access Publishing. preferences on Facebook. American Political Science Review 109(1): 62–78. Broy D (2017) Germany: Starting implementation of the ORCID iD GDPR-brief overview of the government bill for a new Orestis Papakyriakopoulos http://orcid.org/0000-0003- Federal Data Protection Act. European Data Protection 4680-0022 Law Review 3: 93. Capara GV, Barbaranelli C and Zimbardo PG (1999) Notes Personality profiles and political parties. Political Psychology 20(1): 175–197. 1. See e.g. Christl (2016) and Thiele (2017). Castoriadis C (1997) The Imaginary Institution of Society. 2. The pages of CDU and CSU were classified together under the term Union. Cambridge, MA: MIT Press. Papakyriakopoulos et al. 13 Christl W (2016) Big data im wahlkampf: An ihren daten sollt Grimmer J and Stewart BM (2013) Text as data: The promise ihr sie erkennen. Available at: http://www.faz.net/aktuell/ and pitfalls of automatic content analysis methods for pol- feuilleton/medien/big-data-im-wahlkampf-ist-microtarget- itical texts. Political Analysis 21(3): 267–297. Hegelich S (2017) R for social media analysis. In: Luke Sloan ing-entscheidend-14582735.html (accessed 28 March 2018). AQH (ed.) The SAGE Handbook of Social Media Research Cialdini RB (2007) Influence: The Psychology of Persuasion. Methods. London: SAGE Publications, Chapter 28. New York, NY: Harper Collins. Hegelich S and Shahrezaye M (2015) The communication Darling WM (2011) A theoretical and practical implementa- behavior of German MPS on twitter: Preaching to the tion tutorial on topic modeling and Gibbs sampling. In: converted and attacking opponents. European Policy Proceedings of the 49th annual meeting of the association Analysis 1(2): 155–174. for computational linguistics: Human language technologies, Hersh ED (2015) Hacking the Electorate: How Campaigns Portland, Oregon, 19 – 24 June 2011. pp.642–647. Perceive Voters. New York, NY: Cambridge University Madison, WI: Omnipress Inc. Press. Da¨ ubler W, Klebe T, Wedde P, et al. (2016) Hotelling H (1933) Analysis of a complex of statistical vari- Bundesdatenschutzgesetz. Frankfurt: Bund-Verlag. ables into principal components. Journal of Educational Deveaud R, SanJuan E and Bellot P (2014) Accurate and Psychology 24(6): 417. effective latent concept modeling for ad hoc information Jungherr A (2017) Einsatz Digitaler Technologie im retrieval. Document nume´rique 17(1): 61–84. Wahlkampf. Schriftreihe Medienkompetenz 10111: 92–101. Domke D, Shah DV and Wackman DB (1998) Media priming Karpf D (2016) The partisan technology gap. In: Gordon E effects: Accessibility, association, and activation. International and Mihailidis P (eds) Civic Media: Technology, Design, Journal of Public Opinion Research 10(1): 51–74. Practice. Cambridge, MA: MIT Press, pp. 199–216. Dorschel J (2015) Praxishandbuch Big Data: Wirtschaft– Kennedy H and Moss G (2015) Known or knowing publics? Recht–Technik. Wiesbaden: Springer-Verlag. Social media data mining and the question of public Downs A (1957) An economic theory of political action in a agency. Big Data & Society 2(2): 2053951715611145. democracy. Journal of Political Economy 65(2): 135–150. Kosinski M, Stillwell D and Graepel T (2013) Private traits Edsall TB (2012) Let the nanotargeting begin. Available at: and attributes are predictable from digital records of https://campaignstops.blogs.nytimes.com/2012/04/15/let- human behavior. Proceedings of the National Academy of the-nanotargeting-begin/ (accessed 28 March 2018). Sciences 110(15): 5802–5805. Ellul J (1966) Propaganda. New York, NY: Knopf. Kreiss D (2016) Prototype Politics: Technology-Intensive EU-Directive (2016) Regulation (EU) 2016/679 of the Campaigning and the Data of Democracy. New York, European Parliament and of the Council of 27 April NY: Oxford University Press. 2016 on the protection of natural persons with regard to Kruschinski S and Haller A (2017) Restrictions on data- the processing of personal data and on the free movement driven political micro-targeting in Germany. Internet of such data, and repealing Directive 95/46/EC (General Policy Review 6(4): 1–23. Data Protection Regulation). Official Journal of the McDermott Y (2017) Conceptualising the right to data pro- European Union L119: 1–88. tection in an era of big data. Big Data & Society 4(1): Facebook (2018a) Hard questions: Update on Cambridge Analytica. Available at: https://newsroom.fb.com/news/ Mayer-Scho¨ nberger V and Cukier K (2013) Big Data – A 2018/03/hard-questions-cambridge-analytica/ (accessed 20 Revolution that Will Transform how We Live, Work, and September 2018). Think. Orlando, FL: Houghton Mifflin Harcourt. Facebook (2018b) Suspending Cambridge analytica and SCL Medina Serrano JC, Hegelich S, Shahrezaye M, et al. (2019) group from Facebook. Available at: hhttps://newsroom. Social Media Report: The 2017 German Federal Elections. fb.com/news/2018/03/suspending-cambridge-analytica/ Munich: TUM University Press. (accessed 20 September 2018). Nulty P, Theocharis Y, Popa SA, et al. (2016) Social media Franz MM and Ridout TN (2010) Political advertising and and political communication in the 2014 elections to the persuasion in the 2004 and 2008 presidential elections. European parliament. Electoral Studies 44: 429–444. American Politics Research 38(2): 303–329. Ohm P (2014) Changing the rules: General principles for data Geman S and Geman D (1984) Stochastic relaxation, Gibbs use and analysis. Privacy, Big Data, and the Public Good: distributions, and the Bayesian restoration of images. In: Frameworks for Engagement 1: 96–111. IEEE Transactions on Pattern Analysis and Machine Panagopoulos C (2015) All about that base. Party Politics Intelligence. Vol. 6. IEEE, pp.721–741. 22(2): 22–190. Gilks W, Richardson S and Spiegelhalter D (1995) Markov Papakyriakopoulos O, Shahrezaye M, Thieltges A, et al. ChainMonte Carloin Practice. London: Chapman and Hall. (2017) Social media und microtargeting in Deutschland. Griffiths T (2002) Gibbs sampling in the generative model of Informatik-Spektrum 40(4): 327–335. latent Dirichlet allocation. Available at: https://people.cs. Pasquale F (2015) The Black Box Society: The Secret umass.edu/~wallach/courses/s11/cmpsci791s s/readings/ Algorithms that Control Money and Information. griffiths02gibbs.pdf (accessed 28 March 2018). Cambridge, MA: Harvard University Press. 14 Big Data & Society Persily N (2017) Can democracy survive the internet? Journal Strandburg KJ (2014) Monitoring, datafication and consent: of Democracy 28(2): 63–76. Legal approaches to privacy in the big data context. Piaget J (1947) The Psychology of Intelligence. London, New In: Lane J, Stodden V, Bender S, et al. (eds) Privacy, Big Data and the Public Good. New York, NY: Cambridge York: Routledge. University Press, pp. 5–43. Roberts G and Smith A (1994) Simple conditions for the Strohmaier M and Wagner C (2014) Computational social convergence of the Gibbs sampler and Metropolis- science for the world wide web. IEEE Intelligent Systems Hastings algorithms. Stochastic Processes and their 29(5): 84–88. Applications 49(2): 207–216. Taylor L (2017) What is data justice? The case for connecting Ruths D and Pfeffer J (2014) Social media for large studies of digital rights and freedoms globally. Big Data & Society behavior. Science 346(6213): 1063–1064. 4(2): 2053951717736335. Schmitt-Beck R (2003) Mass communication, personal com- Thiele M (2017) Die wahlschlacht der datenbanken. Available munication and vote choice: The filter hypothesis of media at: http://www.tagesspiegel.de/themen/freie-universitaet- influence in comparative perspective. British Journal of berlin/wahlkampf-mit-big-data-die-wahlschlacht-%der- Political Science 33(2): 233–259. datenb anken/19938576.html (accessed 28 March 2018). Sievert C and Shirley KE (2014) Ldavis: A method for visua- Thieltges A, Schmidt F and Hegelich S (2016) The devils tri- lizing and interpreting topics. In: Proceedings of the work- angle: Ethical considerations on developing bot detection shop on interactive language learning, visualization, and methods. In: Proceedings of the 2016 AAAI spring sympo- interfaces, Baltimore, Maryland, 27 June 2014, pp. 63–70. sium, Stanford University, Palo Alto, California, 21 – 23 Simon HA (1996) The Sciences of the Artificial. Cambridge, March 2016, Vol. 2123, pp.253–257. MA: MIT Press. Weedon J, Nuland W and Stamos A (2017) Information Sotto LJ and Simpson AP (2015) Data protection & privacy: operations and Facebook. Technical report, Facebook. United States. Available at: https://www.huntonprivacy- Available at: https://fbnewsroomus.files.wordpress.com/ blog.com/wp-content/uploads/sites/18/2011/04/ 2017/04/facebook-and-information-operations-v1.pdf DDP2015_United_States.pdf (accessed 28 March 2018). (accessed 28 March 2018). Stier S, Posch L, Bleier A, et al. (2017) When populists Woo HY (2015) Strategic communication with verifiable mes- become popular: Comparing Facebook use by the right- sages. PhD Thesis, University of California, USA. wing movement Pegida and German political parties. Information, Communication & Society 20(9): 1–24. Appendix 1 Table 3. Topics overview for category 1: general political issues. Topic content keywords. 1 State / Citizens / Bu¨rgerinnen Gestalten Zusammenhalten Engagement Landkreis Social involvement 2 Education / Thoughts Gymnasium Lo¨sung Lernen Klasse Anforderung 3 Law Bundesverfassungsgericht Verfassung Urteil Grundgesetz Bundesrepublik 4 Economy Euro Steuergerld Milliard Zuschuss kosten 5 Transportation policy Flughafen Nahverkehr Bahn Mitarbeiter Verkehrspolitik 6 Democracy / People / Germany Demokratie Volk Elite Freiheit Bu¨rger 7 Against left-wing radicalism Linksextremisten Antifa Gewalttat Straftat Polizei 8 Religion / Islam / Christianism Islam Muslim Christlich Religion Kirche 9 Thoughts Vielleicht Aber Glaube Eigentlich Ich 10 Refugee policy / for Unterkunft Fluchtling Asybewerber Aufnahme Geflu¨chtet 11 Energy policy Energie Umwelt Klimaschutz Landwirtschaft Energiepolitik 12 Refugee policy / Against Fluchtling Asyl Abschiebung illegal Asylverfahren 14 Austerity / Unemployment / Jobcenter eigentlich soll Sparen Rettung Thoughts 15 Education Schule Kinder Eltern Bildung Lehre 16 Foreign policy EU Russland Ukrain USA Turkei 18 Social policy / Hartz 4 / Poverty Hartz IV Armut Sozial Gerecht Rente 19 Greek crisis Griechenland Bank Finanz Steuerzahl Schuld 20 Housing policy Wohnung Wohnraum Miete Verwaltung Wohnungsbau (continued) Papakyriakopoulos et al. 15 Table 3. Continued 21 AfD (political discussion) AfD Partei rechtspopulist Position Altparteien 23 Against Pegida Demonstration Pegida Nazis Rassismus gegen 24 Against right-wing Diskriminierung Homophobie Rechtsextremismus Freiheit Rassisten radicals, racism 25 Income / Workers unions Mindestlohn Arbeitsgeber Arbeitsnehmer Gewerkschaft Arbeitsbedingung 26 German left-wing history DDR Rosa Luxemburg NATO Geschichte Revolution 27 Thoughts Nachdenken Denkst Wahrheit Du Einfach 28 Family Frau Mann Mutter Familie Kinder 32 Homeland security Innenminister Polizei Ermittelt Justitz Kriminalita¨t 37 Against TTIP/CETA TTIP CETA Stopp unterschreiben Aktion Table 4. Topics overview for category 2: political actors’ activity. Topic content keywords 13 Political events Eingeladen Veranstaltung La¨dt Vortrag Diskussion 22 Greetings Gruss Liebe Freunde Melden Spenden 29 Die Gru¨nen Gru¨nen Bu¨ndnis Landtag Gru¨n-linke Sachsen 30 After political events Danke Besuch toll Fotos Impression 33 Congratulations Glu¨ckwunsch Herzlich Gratulieren Wahlgang Wiedergewa¨hlt 34 Schwesig (Politician) Schwesig Manuela Andrea Nahles Frau 35 Political Coalitions rot gru¨n Schwarz Gelb Koalition 38 Merkel Merkel Angela Kanzlerin CDU Union 39 Petry Petry Lucke Alternative Deutschland AfD 40 Election campaign Daum Druck Wahlkampf Stimmen Sonntag 41 Candidates Wahlkreis Kandidate Landesliste Nominiert Listenplatz 42 Wagenknecht Mannheim Wagenknecht sahra Linksjugend Freiburg 45 Twitter Twitter Schaut Teilen Mitmachen Abstimmen 46 Various politicians Gabriel Schulz Gauck Bundespresident Steinmeier 48 Debates/ TV live Aktuell TV gleich Fernsehen 49 FDP/ Rheinland Pfalz Rheinland Liberal FDP Liberte 50 Greetings/ Thank you Wu¨nsche Spass Gut frohe Feiertag 52 Lindner/ FDP Lindner Christian NRW Bundesvorsitzender Kubicki 53 Die LINKE die Linke linksfraktion Riexinger Kipping themen 55 German news media Focus Welt Spiegel Interview Zeitung Table 5. Topics overview for category 3: regional topics. Topic content keywords 117 NRW politicians Mu¨nster Bochum Bezirksvertreter Essen Ruhr 31 Hamburg Altona Hamburg Landesparteitag Bezirkversammlung Bu¨rgerschaft 36 Leipzig AfD Leipzig Kreisvorsitzende Kreisverband Vorstand Mitglied 43 Bayern Bayern Mu¨nchen Freistaat Wahlprogramm CSU 44 Baden-Wu¨rttemberg Baden Wu¨rttemberg Ministerpresident Stuttgart bw 47 Bielefeld/ Koblenz Bielefeld Koblenz Mainz Ru¨lke Theurer 51 Hamburg/ Schleswig-Holstein Schleswig Holstein Kiel Rostock Schwerin 54 Berlin Berlin Tempelhof Lichtenberg Scho¨neberg Bezirk http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Big Data & Society SAGE

Social media and microtargeting: Political data processing and the consequences for Germany:

Loading next page...
 
/lp/sage/social-media-and-microtargeting-political-data-processing-and-the-BLyl65qx6S
Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Ltd, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
2053-9517
eISSN
2053-9517
DOI
10.1177/2053951718811844
Publisher site
See Article on Publisher Site

Abstract

Amongst other methods, political campaigns employ microtargeting, a specific technique used to address the individual voter. In the US, microtargeting relies on a broad set of collected data about the individual. However, due to the unavailability of comparable data in Germany, the practice of microtargeting is far more challenging. Citizens in Germany widely treat social media platforms as a means for political debate. The digital traces they leave through their interactions provide a rich information pool, which can create the necessary conditions for political microtargeting following appropriate algorithmic processing. More specifically, data mining techniques enable information gathering about a people’s general opinion, party preferences and other non-political characteristics. Through the application of data-intensive algorithms, it is possible to cluster users in respect of common attributes, and through profiling identify whom and how to influence. Applying machine learning algorithms, this paper explores the possibility to identify micro groups of users, which can potentially be targeted with special campaign messages, and how this approach can be expanded to large parts of the electorate. Lastly, based on these technical capabilities, we discuss the ethical and political implications for the German political system. Keywords Microtargeting, social media, Germany, influence, datafication, electorate political actors have started using newly developed Introduction tools in order to analyse citizens’ behaviour and to The contemporary digital revolution is constantly influence the electoral body. One of these methods is transforming the political world. Datafication (Mayer- microtargeting, which allows the formulation of perso- Scho¨ nberger and Cukier, 2013), i.e. the categorization, nalized messages and their direct delivery to groups and quantification and aggregation of phenomena into individuals (Agan, 2007), hence creating a promising databases, and their further algorithmic processing, tool for electoral campaigning and opinion formation. have opened new opportunities in understanding and In this paper, we demonstrate a proof of concept evaluating complex social phenomena. More specific- regarding the ways political actors could establish the ally the use of social media and the internet has resulted conditions for political microtargeting in Germany, in the creation of enormous databases that contain through the utilization of social media platforms. information about citizens’ personal and political pref- erences. Based on these Big Political Data a new type of Bavarian School of Public Policy, Technical University Munich, Germany data-driven interaction between politics and citizens emerges through social media. In its core lies the appli- Corresponding author: cation of advanced statistical and machine learning Orestis Papakyriakopoulos, Bavarian School of Public Policy, Technical algorithms, the possibilities of which enable the devel- University Munich, Richard-Wagner-Strasse 1, 80333 Munich, Germany. opment of new political strategies. Consequently, Email: orestis.papakyriakopulos@tum.de Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http:// www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Big Data & Society The scope of our analysis is to identify the possibilities has the potential to partly track the predispositions or and dangers of microtargeting in electoral campaign- general interests of a voter (Ellul, 1966), and based on ing, taking into consideration ‘state of the art’ technol- them, to modify the candidates’ public images in a way ogy. Therefore, we apply our method to Facebook data that complies with the voters’ opinions (Bond and that could actually be used in political campaigning. Messing, 2015; Capara et al., 1999). Furthermore, by Initially, we explain the theory behind microtargeting directly communicating individual- or group-specific and discuss existing obstacles that prevent its applica- messages, candidates are able to reduce the risk of alie- tion. Second, we illustrate our methodology and pre- nating other voters that might disagree on a topic sent our results. Lastly, we evaluate data-driven (Woo, 2015). Another advantage is that microtargeting microtargeting ethically and comment on its political allows political actors to target voters from the entire consequences. political spectrum, rather than exclusively developing their campaign on the characteristics of the median voter (Downs, 1957), as was the case in the past. Microtargeting in theory Finally, given that opinion polls in the 2016 US and Microtargeting is a strategic process intended to influ- 2017 German elections failed to make plausible fore- ence voters through the direct transmission of stimuli, casts of election results, microtargeting provides a which are formed based on the preferences and charac- methodology to overcome political decisions based teristics of an individual. First of all, microtargeting solely on survey polls. Despite the above advantages, presupposes the collection of large amounts of data it is important to note that there is no comprehensive able to depict the political preferences and other non- study that proves the effectiveness of microtargeting political characteristics of voters. This data can be (Jungherr, 2017; Karpf, 2016); to date it remains a either manually collected or acquired through data- promise emerging from the technological state of the mining and can include information ranging from a art. person’s name, address, and voting history to more One of the main reasons behind the success of micro- abstract properties such as a person’s opinion about targeting in the US is the loose legal framework, which allows political actors to almost freely create, acquire political and non-political topics, their social activity and cultural background. The gathered data are then and use databases that contain personal information. It processed with the aid of appropriate machine learning is characteristic that there is no dedicated data protec- algorithms, while the acquired results depend on the tion law or a concept of ’sensitive’ personal data in the type of algorithm used. It is then possible to make pre- US legislation. Hence, there is no general legislative dictions about specific variables, for example, the out- framework exclusively dealing with the protection of come of a political decision (supervised learning) or a person’s privacy rights (Sotto and Simpson, 2015). identification of patterns in the data through clustering Although legal frameworks, as the FTC, ECPA, (unsupervised learning). Implementing the latter, polit- HIPAA, etc., indeed aim to regulate the monitoring ical actors are in a position to detect sub-groups of of personal data and their protection in their respective voters that share common demographic and attitudinal fields, the administration of data policies takes place traits (Barbu, 2014). Based on the algorithmic results, usually only indirectly, by laws that might impose pur- they can then generate messages or plan actions aimed pose limitations or time limits on the data retention at influencing each specific sub-group or person (often (Boehm, 2015). Furthermore, the US law presents sig- called nanotargeting (Edsall, 2012)), leading to their nificant gaps concerning the protection of individual potential mobilization or de-mobilization. privacy (Ohm, 2014): e.g. the datafication or reuse of Microtargeting was first applied to a limited extent information acquired as a by-product of providing ser- in the US 2000 Federal Elections by the Republican vices is largely unregulated (Strandburg, 2014: 22). Party (Panagopoulos, 2015). Since then, the increasing Consequently, such legal inconsistencies facilitate the datafication of societies has provided fertile ground for development of huge political databases, which can its expansion as a political strategy. A milestone for its then be used for political campaigning (Bennett, 2016). application was the 2008 Federal Elections (Franz and Contrary to the US, the legal framework applicable Ridout, 2010), when the Democratic Party campaign in Germany significantly limits the potential of micro- applied the strategy at full scale. Today, microtargeting targeting. Germany’s privacy law complies with the is a standard online and offline (Panagopoulos, 2015) EU-directive on the processing of personal data. The campaigning method in the US as it overcomes prob- General Data Protection Regulation (EU-Directive, lems of classical political campaigning. First of all, it 2016) provides an extensive regulatory framework for Papakyriakopoulos et al. 3 the protection of privacy and personal data, their social media platforms cannot be assumed to be the acquisition, use and exchange. The GDPR thoroughly same for the actual electorate. The politically active describes the limits and responsibilities of data control- user population on Facebook is in no way representa- lers and processors, supports the subjects’ rights to tive of the whole population of a country (Ruths and privacy and consent, and stipulates the exact regulating Pfeffer, 2014), while the expression of an opinion online role of public authorities. Furthermore, the German does not fully correspond to a coherent political state- data protection law explicitly defines the conditions ment (a like is not a vote; Hegelich and Shahrezaye, and cases in which someone is able to access and use 2015). Furthermore, the evaluation of social media personal data (Da¨ ubler et al., 2016) and lays down the data is bound with multiple methodological issues rights of persons affected (Broy, 2017), strongly limiting (Hegelich, 2017). Still, the case of the United States data exploitation. has shown that political campaigning is more than ever based on data, from which an electorate’s image is derived, also known as perceived voter model (Hersh, Barrier 1: Privacy and data protection 2015). This model may be misleading but nevertheless policy used, as it reduces the complexity in campaign decision- Some authors have argued therefore that microtarget- making. Due to the fact that it is almost impossible to ing cannot be applied in German politics. However, causally link a campaigning tool to election results, despite the legal restrictions, there is ample leeway for microtargeting is used as long as it is assumed to have it on social media platforms (Papakyriakopoulos et al., a successful influence – even if in reality it might not. 2017). The reason is that the German privacy law per- The difficulties in causal inference arise – amongst mits the collection and processing of public personal others – from potential self-fulfilling prophecies: data stemming from social media, as long as the indi- should a campaigning tool identify a target group, the viduals’ interests are not challenged (Dorschel, 2015). campaign will increase interaction with this group. This The GDPR clearly states that given the appropriate special attention might yield positive results; but these safeguards, personal data on political opinion can be results could have also been the same for a totally dif- used for electoral activities (EU-Directive, 2016: 11). In ferent group, as well. Despite the above, microtargeting addition, users on social media services consent to com- is applied, even if it might be epistemologically impos- panies using their personal data for commercial and sible to evaluate its exact impact. other activities, by opting in. Hence, the legal require- ments for using social media data as basis for political Data and method microtargeting are met. Given the fact that users agree to publish on social media a huge amount of data about In this paper, we demonstrate how politicians in their political and non-political preferences and behav- Germany can create the conditions for microtargeting iour, these platforms are an ideal source for political based on data from the social media platform Facebook knowledge extraction. Social media have become a key and we evaluate its ethical and political consequences. environment for political campaigns, as the majority of Facebook was chosen as a data source for three rea- politicians can use them to communicate directly with sons: (1) the German Facebook population is larger the electoral body (Barbera´ and Zeitzoff, 2017; and less selective than that of Twitter. (2) It is part of Hegelich and Shahrezaye, 2015; Medina Serrano the company’s business model to offer targeted adver- et al., 2019; Nulty et al., 2016; Stier et al., 2017). That tisement services for political campaigning, the possibi- aside, political actors often perform organized influen- lities of which we are exploring. (3) Contrary to the US, cing strategies on social media, frequently trespassing where there are extensive political databases with per- the legal limits set (Weedon et al., 2017). sonal identifiers (Bennett, 2016), in Germany this is not the case. Hence, social media provide a straightforward way to acquire knowledge for microtargeting. Barrier 2: Data bias For our proof of concept, we analysed the public The legal framework is not the only obstacle for suc- Facebook pages of the German political parties and cessful microtargeting. The type of data subjected to their supporters: Our sample includes the following algorithmic process and their entailed results can some- parties: Christlich Demokratische Union (CDU), times lead to spurious political action. In our case, the Christlich Soziale Union (CSU), Sozialdemokratische world of social media is not identical to the offline Partei Deutschlands (SPD), Bundnis 89/ Die Grunen, ¨ ¨ world. Hence, political preferences appearing on Die Linke, and Alternative fur Deutschland (AfD). ¨ 4 Big Data & Society CDU is the main conservative party of Germany, while case it provides a plausible classification method for the CSU is the conservative party active in Bavaria. SPD users. Furthermore, it does not distort the microtarget- represents the main German social-democratic party, ing process, as microtargeting targets the identification and Die Linke the radical left. AfD has a nationalist, of voter’s predispositions and not to definitely certify anti-immigrant and neo-liberal agenda, while FDP is a someone’s exclusive support to a party. As shown in conservative, neo-liberal party. Finally, Bundnis 90/Die Figure 1, around 50% of the active users per party have Grunen is the German green party. made only one Like. This is typical of Big Data appli- For each political page, we evaluated user ‘‘Likes’’ cations on social media phenomena, where the infor- on political posts and assigned a partisanship to each mation for the majority of users is low. user according to their preferences (Figure 1). Along with the identification of potential cross-pres- Following a standard microtargeting technique, we sured partisans, we wanted to identify the specific con- focused our study on users who have liked content tent that they find interesting. Therefore, we applied the on pages of more than one political party. The LDA topic modelling algorithm (Blei et al., 2003) to reason behind this decision is that the specific group classify 251.947 posts. LDA has many advantages of voters, also named as cross-pressured partisans, has over other standard text-mining algorithms (Grimmer the highest likelihood to be influenced, as they are and Stewart, 2013), as it can recognize complex rela- both undecided and engaged in politics (Ellul, 1966; tions in text-datasets. The algorithm has the ability to Hersh, 2015). After identifying the relevant groups, we cluster posts in a certain number of topics, where each applied machine learning algorithms to cluster the topic is a set of words that characterize different con- various pages’ posts and created a mapping of 55 dif- tents. Hence, someone can evaluate all the posts with- ferent topics, to which each of the posts might be out having to investigate them one by one. LDA assigns assigned. To achieve this, we performed topic model- a probability for each post belonging to a specific topic. ling analysis by applying a Latent Dirichlet Allocation Then, by ascribing to each post the topic with the high- algorithm (Blei et al., 2003). In this way, we demon- est probability and by detecting the users who liked it, strate how someone can detect individual political we can explicitly track the topics that each user is inter- topics of interest and how these can be later used to ested in. shape targeted messages for each micro group of The LDA algorithm is a three-level hierarchical users. Bayesian model that predicts the probabilities of Prerequisite for the application of microtargeting is words and documents belonging to a number of the existence of a rich database containing voters’ char- topics K given the empirical distribution of words (or acteristics and preferences. Therefore, we mined data n-grams) in a corpus (Blei et al., 2003, 2002). In our from 570 public pages related to the major political case, the corpus consists of the total number of posts M parties in Germany through the Facebook Graph under investigation, while each post corresponds to a API, and analysed posts and Likes. We selected the document d, which is a sequence of N words. LDA is a pages by searching the respective party names in the generative model, i.e. it assumes the probability distri- name field of the Facebook pages. We then classified butions of topics over words  , of documents over manually our results, and removed irrelevant pages. topics  and predicts the probability that a specific We mined every post generated by the administrators word in a specific document will belong to a specific of the pages since their creation, the Likes each post topic. This Bayesian admixture can be described by got, and the unique IDs and profile names of the users the following probability distributions liking them. Usually, the profile name of a Facebook Dir ðaÞ account tends to be the same with the real name of the d K account holder, as Facebook maintains a real name Dir ðÞ policy. In total, we collected 251,947 posts with k V 6,347,448 Likes related to them and identified the activ- z  Multinom ð Þ w K d ity of 1,208,740 unique users. This is only data related to the pages mined, hence the actual size of trackable w j z  Multinom ð Þ w V k users is even larger. We define a user who has liked at least one post of a party as partisan, and a user who has liked posts on pages of two or more parties as a cross- where V is the number of unique words existing in the pressured partisan. Of course, the act of liking per se corpus, and  and  are Dirichlet parameters. does not make someone a party partisan, but in this Multinomial distribution z gives the probability that w Papakyriakopoulos et al. 5 Figure 1. Likes distribution for the users on parties’ pages. a topic will be assigned to a word, given the distribution The specific algorithm comes with the advantage of of topics over documents. Finally, multinomial distri- integrating out the probability distributions  , k d bution w j z gives the probability that the model will (Darling, 2011). Thus as part of the iterative Markov generate a specific word in a specific document given a chain, one can calculate the targeted probabilities topic (Figure 2). through the process In our case, we want to create topics about the con- tent of our corpus based on the empirical distribution for each document : d ¼ð1 ... MÞ of words over documents. Given the complexity of the for each term in a document i ¼ð1 .. . N Þ model and the fact that the initial distributions are w¼i i v þ  n þ i,j i,j assumed and not empirically provided, we randomly Pðz ¼ j jz , VÞ¼ P P i i V K d assign topics to words and documents and we follow v þ V n þ K i,j w¼1 k¼1 i a Markov chain Monte Carlo procedure to update their values (Griffiths, 2002). By iteratively applying a where i is the concrete appearance of a word, i w¼i Markov chain, we can converge to the assumed distri- denotes its exclusion and j is a topic. v corresponds i,j butions and hence sample from them (Gilks et al., 1995; to the number of times word i is assigned to topic j, Roberts and Smith, 1994) the probability Pðz jwÞ that a without its current appearance and index v w i,j w¼1 word in a document belongs to a specific topic. More gives the total number of words in the corpus assigned specifically, we used a collapsed Gibbs sampling to topic j excluding i. Furthermore, n contains the i,j Marcov chain Monte Carlo (MCMC) (Geman and total number of words in document d that are assigned Geman, 1984) method to identify the relevant topics. to topic j without i. Finally, n corresponds to the i 6 Big Data & Society Figure 2. Plate notation for the Latent Dirichlet Allocation algorithm. Figure 3. Topic optimization process. The model with the highest Jensen–Shannon convergence contained 55 topics. total number of words in the document, again not including i. Necessary for the creation of a useful LDA model is divergences for the unique 1485 topic combinations and the election of an appropriate number of topics, in created a distance matrix. On it, we applied a principle order to split the content into interpretable sub- component analysis algorithm (Hotelling, 1933) and we groups. Electing a small number of topics results in a plotted the first three components. clustering of posts, from which one cannot identify con- crete political topics of interest. On the contrary, if the Results number of topics is too large, the algorithm selects many words as topic-important that actually have no The first result of our analysis was the specification of political value. To overcome this issue, we applied a the political content of the investigated posts. The LDA topic optimization algorithm proposed by Deveaud algorithm clustered the posts in 55 topics that can be et al. (2014). More specifically, we calculated the split into three main categories. These categories were Jensen–Shannon divergence between topics for multiple chosen manually, and do not denote that they are the LDA models through the equation optimal ones; still their election makes the results much more interpretable. The first category includes topics V   V X X related to general political issues, such as social involve- 1  1 i,w j,w Dðk , k Þ¼  log þ  log i j i,w j,w ment (topic 1), education (topics 2, 15), national econ- 2  2 j,w i,w w¼1 w¼1 omy (topic 4) and homeland security (topic 32). Some topics do not only illustrate the relevance of posts to a where i, j are two different topics in a model and political issue, but also the exact opinion underlying ,  the probability density values of the distribu- them. For example, topics 10 and 12 are both migration i,w j,w tion  for a word w in the corpus V and each topic, related, but topic 10 includes posts that are refugee- respectively, then selected the model that maximizes the friendly, while topic 12 contains posts that demand a sum of the Jensen–Shannon divergence for all topic stricter migration policy. In addition, there are topics combinations given the expression that analyse political parties (topic 39) or persons (topic 38). In the same category, also exists a set of topics (9, 27, 14) that contain posts that do not make concrete K ¼ argmax Dðk , k Þ opt i j political statements, but declare uncertainty and reflec- KðK  1Þ k , k ¼1 i j tion. The second category includes topics that are related to political actors and candidates, but not as Based on the optimization process (Figure 3), we part of a political discussion. They summarize posts concluded on an LDA model with 55 topics. In order about political events, media appearances and electoral to sort and visualize topics according to their similarity, campaigning. Finally, the third category contains topics we used the method proposed by Sievert and Shirley that are location related and discuss political problems (2014). We used the already calculated Jensen–Shannon about regions. For example, topic 54 includes Papakyriakopoulos et al. 7 Figure 5. Average Likes frequency for the mean and the cross- pressured user. Figure 4. Topic distance visualization with the help of PCA. Circled are topics 21, 43, 38. potential influence will contribute to the motivation of other users as well. posts about Berlin, topic 31 about Hamburg and topic Figure 6 shows the ratio of cross-pressured partisans 43 about Bavaria. between parties. In the given dataset, more or less 10% In order to evaluate and verify our topic classifica- of the page users for each party are cross-pressured. tion, we visualized the relationship between the devel- This does not mean though, that this number corres- oped topics in a three-dimensional space with the help ponds to the actual electorate, as the descriptive results of PCA (Figure 4). Each sphere corresponds to a dif- are biased through our statistical sample and the struc- ferent topic, while their size is proportional to the ture of the social media platform. Nevertheless, it is number of posts they contain. Their distance in 3D- possible to recognize certain predispositions of the space functions as a measure of their content similarity. electorate, as for example an increased interaction of It is visible that three categories classify topics into Union and FDP users and the almost non-existent unique clusters. As expected though, there is some over- overlap of users that are interested in both Die lapping between categories, as a topic might contain Grunen and AfD. keywords belonging to more than one categories. For After the concretization of the topics of interest, example topics 21, 43, 38 appear very close, even microtargeting can be performed in two ways: one though we classified them differently (Table 1). This can either initially focus on single users and then occurs because they all include a combination of track afterwards the topics they are interested in, or posts of all classes. Topic 21 is about AfD, including select specific topics and then identify users interested both posts about its political background and the elec- in them. To demonstrate how further steps of the tions. Topic 38 is about Angela Merkel and her polit- microtarging process could be realised, we choose ran- ical activity, as well as her party structure. Finally, domly topic 4 as an example. Topic 4 includes, amongst topic 43 is about Bavaria, including a number of others, the words: Euro, Steuergeld, Milliarde, posts about the regional CSU party and its candidates. Zuschuss, Kosten, i.e. it is linked to German economic In our analysis, we identified a total of 58,532 cross- policy. It is possible to analyse the relevance of this pressured users. Figure 5 shows that cross-pressured topic for each party, as well as to identify users who users tend to like more frequently than the average like the topic. In this case, we find Union coalition Facebook partisan. This however does not mean that posts that talk about the German economy and identify cross-pressured partisans tend to be more active; on the the relevant cross-pressured partisans. Then, we ran- contrary, it denotes that we can only trace cross-pres- domly pick one of the users to investigate all the sured partisans, when the users are more active online. other topics that are of interest to her. Our random This has an important implication for the perceived cross-pressured user has also liked FDP posts, and as voter’s model: The selection of cross-pressured parti- Table 2 shows, she has also expressed interest in polit- sans as targeted population comes with the advantage ical issues of Schleswig Holstein and homeland security. that they behave as multiplicators, and thus their Hence, we can identify significant political topics of 8 Big Data & Society Figure 6. Percentage of cross-pressured partisans per party. Papakyriakopoulos et al. 9 Table 1. Extended keywords for topics 21, 38, 43. Topics Extended keywords 21 AfD, Partei, rechtpopulist, Position, Altparteien, Wahlen, Argument, Stimmen, vertreten, Gegner 38 Merkel, Angela, Kanzlerin, CDU, Union, CSU, Seehofer,Volk, Fluchtlingspolitik, Terroranschlag 43 Bayern, Mu¨nchen, Freistaat, Wahlprogramm, CSU, muss, Regierung, Generalsekretarin, Schalzwedel, Gru¨ne Table 2. Topics of interest for an example-user and for Union-SPD cross-pressured users. Target Topic keywords Example user 4 Euro Steuergerl Milliar Zuschuss kosten 32 Innenminister Polizei ermittelt Justitz Kriminalita¨t 51 Schleswig Holstein Kiel Rostock Schwerin Union-SPD cross-pressured users 8 Islam Muslim Christlich Religion Kirche interest for the user, as well as political parties to which, data, data existing on social media platforms provide the user is positively inclined. a fruitful source for microtargeting. By mining and The topic modelling algorithm, however, does not structuring the content of 570 German political pages, illustrate if the user thinks positively or negatively we managed to detect over 58,000 cross-pressured users of a political topic, i.e. it does not trace their exact through their Likes. The selection of this sub-popula- political attitudes to the issues. To do this it would tion was based on the idea that they are people both be necessary to apply a sentiment analysis algorithm active in politics and potentially undecided on their to the parties’ posts, or a qualitative analysis thereof. exact party preference. Hence, communicating a mes- In the current research, we did not perform a senti- sage to them is of greater value than to people who are ment analysis. Given the results of the sentiment ana- strict supporters of one party or are not interested in lysis, the person’s political evaluations of political politics at all. In order to track topics of interest of topics and party sympathy, a campaign-maker has cross-pressured users, we applied simple machine learn- adequate information to create personalized messages ing algorithms on the pages’ content and found the and communicate them through micro-targeted most common issues discussed. Finally, we connected advertisement. the topics with the users through their posts’ Likes, Similarly, it is possible to identify topics that are finding out valuable political information about them. important for groups of strategic importance. For Accompanied with a sentiment analysis algorithm, the example, partisans that are cross-pressured by the necessary knowledge can be gathered for the creation of Union and SPD are highly interested in topic 8, personalized messages. Last step is to contact the users, which is related to Islam and Christianism. Thus, a process that should be adapted to and compliant with after the combination with a sentiment analysis, the the legal frameworks. creation of an advertisement specifically related to The communication of the message could theoretically this topic can provide additional advantage to a polit- be performed in two ways: One could cluster users shar- ical party, as it might mobilize an important part of the ing common characteristics and directly target them electorate towards its ends. Of course, the content of a through the platform’s advertisement service, which personal message can be further specialised, as it is allows campaigners to define custom target audiences. always possible to access recursively the full post that This comes with the advantage that there is no need for a user liked, and locate exactly its content in relation to manual matching of users to their real world identities, as the topic it belongs to. it suffices to communicate the message to them through Given the mined Facebook data, we proved that the platform. The second way is to manually look at a there is an extensive dataset for potential microtarget- person’s further public activity on Facebook, and given ing in German politics available in social media ser- additional sociodemographic data available, try to find vices. Although national privacy regulations usually another communication path (e.g. email, mail, phone forbid the direct acquirement and use of personal number, etc.). Although the second way is time- 10 Big Data & Society consuming, complicated, and sometimes inadequate, which in this case were not taken into consideration, gathering socio-demographic data about individuals and but are still publicly available online (Kosinski et al., then targeting them offline is actually what is intensively 2013). By collecting data from other social media inter- done in US campaigns (Hersh, 2015: 77). Still, in EU the actions, e.g. likes on news media or other non-political feasibility of the strategy is much lower, due to the exist- pages, one can train models and assign probabilities of ing privacy laws. For the second way to be applicable, someone being interested in a political issue or party. In political actors should develop platforms, applications, or this way, political knowledge can be extracted about services, through which they would get the person’s con- users that actually did not actually interact with any sent to target them with the related messages. party-related content on the platform and hence be The processing of the social media political dataset included as audience of political microtargeting. also comes with specific limitations. The inferences drawn reveal only part of a person’s political charac- Discussion teristics, and only if indeed someone’s online behaviour matches their actual political preferences. Furthermore, The penetration of datafication into people’s privacy is the users detected online might not have a voting right once more proven through our investigation, as we in Germany, making the sampling process biased and were able to gather and process a large amount of distorting the advertisement process. user data from the social media platform Facebook. The presented results serve as a proof of concept. We Hence, from our perspective, it is important to evaluate have thoroughly described how microtargeting based on the impact of the latest technological advances on the social media data could be performed. The analysis was ethical and political life of our society. The discussion focused on Germany, where the acquisition of relevant that has already started regarding the application of data is usually problematic. The described method can be data-intensive algorithms to social networks (e.g. extended through further actions in both online and off- social bots (Thieltges et al., 2016), using algorithms line campaigning. For example, parties have already for social engineering (Strohmaier and Wagner, started promoting apps to connect the digital and ana- 2014)), must now be also extended to the effect of logue campaigning. These apps help to analyse the reac- microtargeting as a technology driven campaigning tions of people, giving feedback to the campaign- method. As the new technological capabilities raise managers about their campaigning tactics. Furthermore, questions regarding the limits of ethical political influ- the combination of the app data with data coming from ence and the potential transformation of political social media can provide even more insights on the rele- behaviour in contemporary society, our task is to iden- vant issues. The processed social media data can also be tify and reflect on the newly emerged issues. used to complement standard opinion prediction tech- The study showed, that through machine learning, it niques. Existing census data about demographic charac- is possible to track someone’s interests and subse- teristics and public record data about past voting quently develop personalized political advertisement behaviour can be combined with results from the topic that can be used to influence social media users. modelling and sentiment analysis algorithms and hence Hence, the first question emerging is whether microtar- explain the features of political behaviour. geting might lead to the manipulation of voters. The In our study, we focused only on the detection of transmission of a personalized message does not per se voters’ political topics of interest, however part of the signify the manipulation of a person, as each individual microtargeting process is also the evaluation of the per- possesses the freedom to decide whom to vote for. As sonalized advertisement’s success. This can be done the public is offering more and more voluntarily their after the first application of microtargeting, through information in exchange for online or offline services analysis of click-statistics, performance of surveys and (Barbu, 2014) though, algorithms tend to become the actual election results. Furthermore, after the cali- more precise in evaluating personal preferences and bration of the process, the generation of microtargeting attitudes. As microtargeting could potentially contact data can be highly automated. This of course raises the the person directly with a very well adapted message, it question of whether politicians’ positions would still be might achieve what is called instant influence: trigger the person’s mind to develop a conditioned response a result of their actual opinions or just an algorithmic creation for attracting voters. Finally, machine learning the way the political actors desire (Cialdini, 2007). algorithms can predict the users’ interest in further This happens, because in cases of fast incoming infor- topics or parties, even if they have not liked them on mation stimuli, the individual does not process them the platform. Further data would be required for this, rationally (Simon, 1996). On the contrary, the Papakyriakopoulos et al. 11 information is assimilated intuitively, creating a phe- 2015), and realizing how datafication has pragmatically nomenalist connection between the message and the altered the contemporary social structure. political party (Piaget, 1947). Of course, framing a Important for the ethical evaluation of microtarget- party successfully also presupposes other psychological, ing, as well as for data privacy, is also who acquired the social and political preconditions to be present (Domke related data, not only how. For us being able to gain et al.. 1998; Schmitt-Beck, 2003), which cannot be access to the aforementioned dataset poses a dilemma: formed by simply sending well-adapted personal mes- Should public data, for which users have provided their sages. But given these conditions, a systematic applica- consent to be used and further processed, become openly tion of microtargeting might lead to a ‘progression from available, or should they remain only under the control thought to action artificially’ (Ellul, 1966). A reaction to of the initial gatherer? The question is relevant more this issue is the conscious understanding of the person than ever to the present discussion, given the contempor- that they are being microtargeted. In this way, they ary Facebook data scandal (Facebook, 2018a, 2018b), as would be in position to evaluate a message totally dif- well as the platform’s decision to significantly limit the ferently, knowing that the incoming stimuli are already data available through its application programming adapted to their own attitudes. The rule of the conscious interface (API). On the one hand, making data broadly over the unconscious is a precondition for the society to open might result to an uncontrolled data mining phe- remain autonomous (Castoriadis, 1997). nomenon (Pasquale, 2015), with private data becoming a This type of consciousness is not only needed at the part of the public sphere. On the other hand, the posses- moment of evaluating a political message, but must sion of these public data only by the original gatherer also exist at the level of privacy. It is common that might result in the problem of a knowledge monopoly, through the use of apps and online platforms, people making the data holder much more powerful in eco- voluntarily provide their personal data and allow their nomic and political terms than other social actors. further usage as a by-product of the service. It is The specific case study would have a different form, important for users to become aware of what they are if the data were collected under the new API rules of the agreeing on, and what consequences their actions have. platform. Important public data for microtargeting, as user likes, cannot be downloaded in an automated way. In this direction, certain normative and legal impera- tives have already been formulated: Transparency of If public online data are accessible only to the extent data collection, processing and application (Barocas platforms decide, and political actors can target users et al., 2017), autonomy of the subject on having control exclusively through the targeting services provided, of their own personal data (McDermott, 2017), and then the political system itself becomes contingent to (in)visibility: the right of the subject to choose if and technological companies. Electing microtargeting as a to know how personal data might be collected and used political campaigning strategy thus presupposes the (Taylor, 2017), are stated as necessary for supporting constant compliance of political actors with the existing someone’s privacy. The EU General Data Protection political and legal conditions (Kruschinski and Haller, Regulation makes also steps towards this direction, 2017), as well as with the market structures and the by explicitly incorporating transparency and consent dominant online platform decisions. in its regulatory claims. Another issue regarding microtargeting is related to Despite the regulatory efforts, the act of a user opting the perceived voter model. Given that the majority of in, given a very long document of terms and conditions, users in social networks are relatively inactive, the where how personal data might be used is outlined in a danger exists that politicians will concentrate on the short and general manner does not signify transparency, analysis of data provided by the more active users, or actual consent (Strandburg, 2014). Especially regard- even if that sample is not representative of the popula- ing personal data for microtargeting, the information tion (Barbera´ and Rivero, 2015). The less data one can that should be presented to the subject in order to give gather about a person, the more inexact can their atti- their consent should clarify exactly what information is tude-prediction be. Thus, a campaign might be devel- going to be collected, how, by whom and for what pur- oped based on falsely assessed voters’ attitudes. If pose. This is a prerequisite for the subjects’ expectations political campaigns are highly or exclusively data- about the collected data to coincide with the actual data driven, it leads to the perceived voter phenomenon usage (Barocas and Nissenbaum, 2014). At the same (Hersh, 2015): All campaigning decisions are based to time, the individuals should be emancipated, by both an algorithmically calculated electorate and thus, any getting to know through access to the history of their forecasts are dependent on the nature of the collected personal data used by services (Kennedy and Moss, data. Given that social media data always possess a 12 Big Data & Society 3. https://www.facebook.com/help/112146705538576 certain rate of bias (Ruths and Pfeffer, 2014), it is pos- (accessed 21 March 2018). sible that political actors might perform a campaigning 4. Appendix 1 contains the full description of the topics cre- on a ‘constructed’ reality and not on an actual one. Of ated, as well as their important keywords. course, gathering of even more data is not a solution. 5. The topics contain keywords as e.g. Vielleicht, aber, If someone observes campaigning in the US, they might glaube, nachdenken. question the independency of the electorate: US parties’ 6. E.g. CDU’s app ‘connect17’. campaigns aim for the mobilization or de-mobilization of specific social groups, demographic layers and geo- graphic populations in order to strategically achieve References their goals (Hersh, 2015; Kreiss, 2016; Persily, 2017). Agan T (2007) Silent marketing: Micro-targeting. Available Furthermore huge public databases contain extensive at: http://gaia.adage.com/images/random/microtar- data about the majority of the electorate and their get031207.pdf (accessed 27 March 2018). voting history. The discussion about microtargeting Barbera´ P and Rivero G (2015) Understanding the political and data privacy is already under way in Europe and representativeness of twitter users. Social Science the newly emerged issues should be assessed. Computer Review 33(6): 712–729. This study demonstrates through its ‘proof of con- Barbera´ P and Zeitzoff T (2017) The new public address cept’ certain possibilities and dangers of microtarget- system: Why do world leaders adopt social media? ing, in order to initiate an important debate for the International Studies Quarterly 62(1): 121–130. political system. To expand this discussion, further Barbu O (2014) Advertising, microtargeting and social media. qualitative and quantitative research is needed, in Procedia – Social and Behavioral Sciences 163: 44–49. order to uncover: (1) How political communication Barocas S, Bradley E, Honavar V, et al. (2017) Big data, data science, and civil rights. ArXiv. Available at: https://arxiv. on social media influences the formation of political org/abs/1706.03102 (accessed 5 November 2018). attitudes in terms of polarization, political mobilization Barocas S and Nissenbaum H (2014) Big data’s end run and opinion formation? (2) What is the effect of polit- around anonymity and consent. Privacy, Big Data, and ical campaigning services offered by social media and the Public Good: Frameworks for Engagement 1: 44–75. other internet platforms? (3) At which level current Bennett CJ (2016) Voter databases, micro-targeting, and data privacy policies protect individuals and what else protection law: Can political parties campaign in Europe could be done? The answers to the aforementioned as they do in North America? International Data Privacy questions, if given, can redefine how the political dis- Law 6(4): 261–275. course should be performed in the digital age. Blei DM, Ng AY and Jordan MI (2002) Latent Dirichlet allocation. In: Dietterich TG, Becker S and Ghahramani Declaration of conflicting interests Z (eds) Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, pp. 601–608. The authors declared no potential conflicts of interest with Blei DM, Ng AY and Jordan MI (2003) Latent Dirichlet allo- respect to the research, authorship, and/or publication of this cation. Journal of Machine Learning Research 3: 993–1022. article. Boehm F (2015) A comparison between us and EU data pro- tection legislation for law enforcement purposes. Available Funding at: www.europarl.europa.eu/RegData/etudes/2015/536459/ The authors disclosed receipt of the following financial sup- IPOL_ STU%282015%29536459_EN.pdf (accessed 28 port for the research, authorship, and/or publication of this March 2018). article: the German Research Foundation (DFG) and the Bond R and Messing S (2015) Quantifying social media’s Technical University of Munich within the funding pro- political space: Estimating ideology from publicly revealed gramme Open Access Publishing. preferences on Facebook. American Political Science Review 109(1): 62–78. Broy D (2017) Germany: Starting implementation of the ORCID iD GDPR-brief overview of the government bill for a new Orestis Papakyriakopoulos http://orcid.org/0000-0003- Federal Data Protection Act. European Data Protection 4680-0022 Law Review 3: 93. Capara GV, Barbaranelli C and Zimbardo PG (1999) Notes Personality profiles and political parties. Political Psychology 20(1): 175–197. 1. See e.g. Christl (2016) and Thiele (2017). Castoriadis C (1997) The Imaginary Institution of Society. 2. The pages of CDU and CSU were classified together under the term Union. Cambridge, MA: MIT Press. Papakyriakopoulos et al. 13 Christl W (2016) Big data im wahlkampf: An ihren daten sollt Grimmer J and Stewart BM (2013) Text as data: The promise ihr sie erkennen. Available at: http://www.faz.net/aktuell/ and pitfalls of automatic content analysis methods for pol- feuilleton/medien/big-data-im-wahlkampf-ist-microtarget- itical texts. Political Analysis 21(3): 267–297. Hegelich S (2017) R for social media analysis. In: Luke Sloan ing-entscheidend-14582735.html (accessed 28 March 2018). AQH (ed.) The SAGE Handbook of Social Media Research Cialdini RB (2007) Influence: The Psychology of Persuasion. Methods. London: SAGE Publications, Chapter 28. New York, NY: Harper Collins. Hegelich S and Shahrezaye M (2015) The communication Darling WM (2011) A theoretical and practical implementa- behavior of German MPS on twitter: Preaching to the tion tutorial on topic modeling and Gibbs sampling. In: converted and attacking opponents. European Policy Proceedings of the 49th annual meeting of the association Analysis 1(2): 155–174. for computational linguistics: Human language technologies, Hersh ED (2015) Hacking the Electorate: How Campaigns Portland, Oregon, 19 – 24 June 2011. pp.642–647. Perceive Voters. New York, NY: Cambridge University Madison, WI: Omnipress Inc. Press. Da¨ ubler W, Klebe T, Wedde P, et al. (2016) Hotelling H (1933) Analysis of a complex of statistical vari- Bundesdatenschutzgesetz. Frankfurt: Bund-Verlag. ables into principal components. Journal of Educational Deveaud R, SanJuan E and Bellot P (2014) Accurate and Psychology 24(6): 417. effective latent concept modeling for ad hoc information Jungherr A (2017) Einsatz Digitaler Technologie im retrieval. Document nume´rique 17(1): 61–84. Wahlkampf. Schriftreihe Medienkompetenz 10111: 92–101. Domke D, Shah DV and Wackman DB (1998) Media priming Karpf D (2016) The partisan technology gap. In: Gordon E effects: Accessibility, association, and activation. International and Mihailidis P (eds) Civic Media: Technology, Design, Journal of Public Opinion Research 10(1): 51–74. Practice. Cambridge, MA: MIT Press, pp. 199–216. Dorschel J (2015) Praxishandbuch Big Data: Wirtschaft– Kennedy H and Moss G (2015) Known or knowing publics? Recht–Technik. Wiesbaden: Springer-Verlag. Social media data mining and the question of public Downs A (1957) An economic theory of political action in a agency. Big Data & Society 2(2): 2053951715611145. democracy. Journal of Political Economy 65(2): 135–150. Kosinski M, Stillwell D and Graepel T (2013) Private traits Edsall TB (2012) Let the nanotargeting begin. Available at: and attributes are predictable from digital records of https://campaignstops.blogs.nytimes.com/2012/04/15/let- human behavior. Proceedings of the National Academy of the-nanotargeting-begin/ (accessed 28 March 2018). Sciences 110(15): 5802–5805. Ellul J (1966) Propaganda. New York, NY: Knopf. Kreiss D (2016) Prototype Politics: Technology-Intensive EU-Directive (2016) Regulation (EU) 2016/679 of the Campaigning and the Data of Democracy. New York, European Parliament and of the Council of 27 April NY: Oxford University Press. 2016 on the protection of natural persons with regard to Kruschinski S and Haller A (2017) Restrictions on data- the processing of personal data and on the free movement driven political micro-targeting in Germany. Internet of such data, and repealing Directive 95/46/EC (General Policy Review 6(4): 1–23. Data Protection Regulation). Official Journal of the McDermott Y (2017) Conceptualising the right to data pro- European Union L119: 1–88. tection in an era of big data. Big Data & Society 4(1): Facebook (2018a) Hard questions: Update on Cambridge Analytica. Available at: https://newsroom.fb.com/news/ Mayer-Scho¨ nberger V and Cukier K (2013) Big Data – A 2018/03/hard-questions-cambridge-analytica/ (accessed 20 Revolution that Will Transform how We Live, Work, and September 2018). Think. Orlando, FL: Houghton Mifflin Harcourt. Facebook (2018b) Suspending Cambridge analytica and SCL Medina Serrano JC, Hegelich S, Shahrezaye M, et al. (2019) group from Facebook. Available at: hhttps://newsroom. Social Media Report: The 2017 German Federal Elections. fb.com/news/2018/03/suspending-cambridge-analytica/ Munich: TUM University Press. (accessed 20 September 2018). Nulty P, Theocharis Y, Popa SA, et al. (2016) Social media Franz MM and Ridout TN (2010) Political advertising and and political communication in the 2014 elections to the persuasion in the 2004 and 2008 presidential elections. European parliament. Electoral Studies 44: 429–444. American Politics Research 38(2): 303–329. Ohm P (2014) Changing the rules: General principles for data Geman S and Geman D (1984) Stochastic relaxation, Gibbs use and analysis. Privacy, Big Data, and the Public Good: distributions, and the Bayesian restoration of images. In: Frameworks for Engagement 1: 96–111. IEEE Transactions on Pattern Analysis and Machine Panagopoulos C (2015) All about that base. Party Politics Intelligence. Vol. 6. IEEE, pp.721–741. 22(2): 22–190. Gilks W, Richardson S and Spiegelhalter D (1995) Markov Papakyriakopoulos O, Shahrezaye M, Thieltges A, et al. ChainMonte Carloin Practice. London: Chapman and Hall. (2017) Social media und microtargeting in Deutschland. Griffiths T (2002) Gibbs sampling in the generative model of Informatik-Spektrum 40(4): 327–335. latent Dirichlet allocation. Available at: https://people.cs. Pasquale F (2015) The Black Box Society: The Secret umass.edu/~wallach/courses/s11/cmpsci791s s/readings/ Algorithms that Control Money and Information. griffiths02gibbs.pdf (accessed 28 March 2018). Cambridge, MA: Harvard University Press. 14 Big Data & Society Persily N (2017) Can democracy survive the internet? Journal Strandburg KJ (2014) Monitoring, datafication and consent: of Democracy 28(2): 63–76. Legal approaches to privacy in the big data context. Piaget J (1947) The Psychology of Intelligence. London, New In: Lane J, Stodden V, Bender S, et al. (eds) Privacy, Big Data and the Public Good. New York, NY: Cambridge York: Routledge. University Press, pp. 5–43. Roberts G and Smith A (1994) Simple conditions for the Strohmaier M and Wagner C (2014) Computational social convergence of the Gibbs sampler and Metropolis- science for the world wide web. IEEE Intelligent Systems Hastings algorithms. Stochastic Processes and their 29(5): 84–88. Applications 49(2): 207–216. Taylor L (2017) What is data justice? The case for connecting Ruths D and Pfeffer J (2014) Social media for large studies of digital rights and freedoms globally. Big Data & Society behavior. Science 346(6213): 1063–1064. 4(2): 2053951717736335. Schmitt-Beck R (2003) Mass communication, personal com- Thiele M (2017) Die wahlschlacht der datenbanken. Available munication and vote choice: The filter hypothesis of media at: http://www.tagesspiegel.de/themen/freie-universitaet- influence in comparative perspective. British Journal of berlin/wahlkampf-mit-big-data-die-wahlschlacht-%der- Political Science 33(2): 233–259. datenb anken/19938576.html (accessed 28 March 2018). Sievert C and Shirley KE (2014) Ldavis: A method for visua- Thieltges A, Schmidt F and Hegelich S (2016) The devils tri- lizing and interpreting topics. In: Proceedings of the work- angle: Ethical considerations on developing bot detection shop on interactive language learning, visualization, and methods. In: Proceedings of the 2016 AAAI spring sympo- interfaces, Baltimore, Maryland, 27 June 2014, pp. 63–70. sium, Stanford University, Palo Alto, California, 21 – 23 Simon HA (1996) The Sciences of the Artificial. Cambridge, March 2016, Vol. 2123, pp.253–257. MA: MIT Press. Weedon J, Nuland W and Stamos A (2017) Information Sotto LJ and Simpson AP (2015) Data protection & privacy: operations and Facebook. Technical report, Facebook. United States. Available at: https://www.huntonprivacy- Available at: https://fbnewsroomus.files.wordpress.com/ blog.com/wp-content/uploads/sites/18/2011/04/ 2017/04/facebook-and-information-operations-v1.pdf DDP2015_United_States.pdf (accessed 28 March 2018). (accessed 28 March 2018). Stier S, Posch L, Bleier A, et al. (2017) When populists Woo HY (2015) Strategic communication with verifiable mes- become popular: Comparing Facebook use by the right- sages. PhD Thesis, University of California, USA. wing movement Pegida and German political parties. Information, Communication & Society 20(9): 1–24. Appendix 1 Table 3. Topics overview for category 1: general political issues. Topic content keywords. 1 State / Citizens / Bu¨rgerinnen Gestalten Zusammenhalten Engagement Landkreis Social involvement 2 Education / Thoughts Gymnasium Lo¨sung Lernen Klasse Anforderung 3 Law Bundesverfassungsgericht Verfassung Urteil Grundgesetz Bundesrepublik 4 Economy Euro Steuergerld Milliard Zuschuss kosten 5 Transportation policy Flughafen Nahverkehr Bahn Mitarbeiter Verkehrspolitik 6 Democracy / People / Germany Demokratie Volk Elite Freiheit Bu¨rger 7 Against left-wing radicalism Linksextremisten Antifa Gewalttat Straftat Polizei 8 Religion / Islam / Christianism Islam Muslim Christlich Religion Kirche 9 Thoughts Vielleicht Aber Glaube Eigentlich Ich 10 Refugee policy / for Unterkunft Fluchtling Asybewerber Aufnahme Geflu¨chtet 11 Energy policy Energie Umwelt Klimaschutz Landwirtschaft Energiepolitik 12 Refugee policy / Against Fluchtling Asyl Abschiebung illegal Asylverfahren 14 Austerity / Unemployment / Jobcenter eigentlich soll Sparen Rettung Thoughts 15 Education Schule Kinder Eltern Bildung Lehre 16 Foreign policy EU Russland Ukrain USA Turkei 18 Social policy / Hartz 4 / Poverty Hartz IV Armut Sozial Gerecht Rente 19 Greek crisis Griechenland Bank Finanz Steuerzahl Schuld 20 Housing policy Wohnung Wohnraum Miete Verwaltung Wohnungsbau (continued) Papakyriakopoulos et al. 15 Table 3. Continued 21 AfD (political discussion) AfD Partei rechtspopulist Position Altparteien 23 Against Pegida Demonstration Pegida Nazis Rassismus gegen 24 Against right-wing Diskriminierung Homophobie Rechtsextremismus Freiheit Rassisten radicals, racism 25 Income / Workers unions Mindestlohn Arbeitsgeber Arbeitsnehmer Gewerkschaft Arbeitsbedingung 26 German left-wing history DDR Rosa Luxemburg NATO Geschichte Revolution 27 Thoughts Nachdenken Denkst Wahrheit Du Einfach 28 Family Frau Mann Mutter Familie Kinder 32 Homeland security Innenminister Polizei Ermittelt Justitz Kriminalita¨t 37 Against TTIP/CETA TTIP CETA Stopp unterschreiben Aktion Table 4. Topics overview for category 2: political actors’ activity. Topic content keywords 13 Political events Eingeladen Veranstaltung La¨dt Vortrag Diskussion 22 Greetings Gruss Liebe Freunde Melden Spenden 29 Die Gru¨nen Gru¨nen Bu¨ndnis Landtag Gru¨n-linke Sachsen 30 After political events Danke Besuch toll Fotos Impression 33 Congratulations Glu¨ckwunsch Herzlich Gratulieren Wahlgang Wiedergewa¨hlt 34 Schwesig (Politician) Schwesig Manuela Andrea Nahles Frau 35 Political Coalitions rot gru¨n Schwarz Gelb Koalition 38 Merkel Merkel Angela Kanzlerin CDU Union 39 Petry Petry Lucke Alternative Deutschland AfD 40 Election campaign Daum Druck Wahlkampf Stimmen Sonntag 41 Candidates Wahlkreis Kandidate Landesliste Nominiert Listenplatz 42 Wagenknecht Mannheim Wagenknecht sahra Linksjugend Freiburg 45 Twitter Twitter Schaut Teilen Mitmachen Abstimmen 46 Various politicians Gabriel Schulz Gauck Bundespresident Steinmeier 48 Debates/ TV live Aktuell TV gleich Fernsehen 49 FDP/ Rheinland Pfalz Rheinland Liberal FDP Liberte 50 Greetings/ Thank you Wu¨nsche Spass Gut frohe Feiertag 52 Lindner/ FDP Lindner Christian NRW Bundesvorsitzender Kubicki 53 Die LINKE die Linke linksfraktion Riexinger Kipping themen 55 German news media Focus Welt Spiegel Interview Zeitung Table 5. Topics overview for category 3: regional topics. Topic content keywords 117 NRW politicians Mu¨nster Bochum Bezirksvertreter Essen Ruhr 31 Hamburg Altona Hamburg Landesparteitag Bezirkversammlung Bu¨rgerschaft 36 Leipzig AfD Leipzig Kreisvorsitzende Kreisverband Vorstand Mitglied 43 Bayern Bayern Mu¨nchen Freistaat Wahlprogramm CSU 44 Baden-Wu¨rttemberg Baden Wu¨rttemberg Ministerpresident Stuttgart bw 47 Bielefeld/ Koblenz Bielefeld Koblenz Mainz Ru¨lke Theurer 51 Hamburg/ Schleswig-Holstein Schleswig Holstein Kiel Rostock Schwerin 54 Berlin Berlin Tempelhof Lichtenberg Scho¨neberg Bezirk

Journal

Big Data & SocietySAGE

Published: Nov 20, 2018

Keywords: Microtargeting; social media; Germany; influence; datafication; electorate

References