Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Official statistics and Big Data:

Official statistics and Big Data: The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society. Keywords Big Data, official statistics, European Statistical System Professional standards play a vital role in securing Introduction trust in official statistics. Statisticians have their own The advent of Big Data is expected to have a big impact ethics code (United Nations, 2013), which includes an on organisations for which the production and analysis absolute respect for the confidentiality of data provided of data and information is core business. National by respondents. Data collected for statistical purposes Statistical Institutes (NSIs) are such organisations. may never be disclosed and may never be used for other They are responsible for official statistics, which are purposes. At the level of the European Union (EU), heavily used by policy-makers and other important quality norms have been codified in the so-called players in society. Arguably, the way NSIs take up Statistics Code of Practice (Eurostat, 2014). The trust Big Data will eventually have implications for all of earned by respecting professional standards is also the society. basis for a privileged position of NSIs in respect of data Official statistics play a key role in modern society. acquisition. Many NSIs have access by law to govern- NSIs aim at providing information on all important ment data sources and have the power to collect data aspects of society in an impartial way, and according from other parties, often without having to pay the to the highest scientific standards. Information that ful- provider. Moreover, for statistical purposes, many fils these demands is used in public discussion, forms NSIs are allowed to link data from different sources. the basis of policy decisions, is required for business use, feeds scientific research, is used in education and Statistics Netherlands, Heerlen, The Netherlands so on. Official statistics can only meet this demand if they can be trusted. In advanced societies, official stat- Corresponding author: istics are often taken for granted, but where trust is Peter Struijs, Statistics Netherlands, P.O. Box 4481, Heerlen, 6401 CZ, lacking, society misses an important pillar for informed The Netherlands. discussion and evidence-based policy-making. Email: p.struijs@cbs.nl Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (http://www.uk.sagepub.com/aboutus/ openaccess.htm). 2 Big Data & Society Given this role for NSIs, what does the emergence of supplemented and increasingly replaced by administra- Big Data mean for official statistics? This question is tive data sources. Nowadays, some countries do not addressed in this contribution, but as we will see, there conduct extensive population surveys anymore but are many reasons why the role of NSIs in the Big Data compile census statistics by combining and analysing era is not ‘given’. In order to keep a sound and trusted data from several administrative sources. NSIs basis of information for society to rely on, we argue became more integrated in the information architecture that NSIs may have to adapt to the changing context of the government. In this way, the burden on persons in which they operate. and businesses to respond to questionnaires was con- siderably reduced. In the context of all of these developments, the infor- Official statistics in a changing context mation provided by NSIs still remained unique. In par- In respect of information, society is changing rapidly. ticular, the possibility of combining data from different For example, there is an enormous growth of data that sources made official statistics even more valuable, is gathered and recorded in myriad ways: from satellite since in many countries no other organisation was pos- and sensory data, to social network and transactional itioned to do so. In parallel, efforts also increased to data and so on. The availability of data is also expand- standardise and harmonise these various sources of offi- ing and becoming the foundation of business models. cial statistics, especially in the EU. Supported by legis- Information is becoming more visual and interactive. lation, official statistics in the EU are now considered a Information and communication technology is becom- system, the so-called European Statistical System, ing ever more advanced, processing power and data or ESS. storage capacity is continuously rising, cloud solutions However, Big Data is changing the environment of are emerging and applications are becoming more intel- the NSIs once more as data scarcity is becoming less of ligent. These developments have been described in more an issue. For NSIs, there are potential benefits as new depth and detail by many observers, such as Mayer- data sources and opportunities emerge. But it also Scho¨ nberger and Cukier (2013). makes the products of NSIs potentially less unique, These changes have many impacts on societies. For since other players in the information market may one, the increased gathering of data and the commer- start – and have actually started – producing statistics, cial and social possibilities of data usage influence for instance, on inflation, such as the Billion Prices public opinion on privacy. Some are concerned if Project of MIT. their data are re-used without their consent, for com- Let us first look at the opportunities for NSIs offered mercial reasons or otherwise. Others do not mind so by Big Data. There is a huge potential for new statistics much, if this means that services are provided for (Daas et al., 2013). Location data for mobile phones free. Many people voluntarily share information on could be used for almost instantaneous daytime popu- social networks without caring for privacy. People lation and tourism statistics (De Jonge et al., 2012). have less patience to fill in questionnaires, especially if Social media messages could be used for several types the data requested have been registered somewhere else of indicators, such as an early indicator of consumer already. Government agencies are expected to be more confidence. Inflation figures could be derived from price forthcoming in providing data. Governments have information on the web, and so on. In addition, Big reacted to the changes by formulating policies on, for Data sources may be used to substitute or supplement instance, open data and availability of public sector more traditional data sources, such as questionnaire information, also at the EU level (European Union, and administrative data. For instance, data collection 2013). by questionnaire on road use may not be necessary How have NSIs responded? Until around the 1980s, anymore if detailed traffic loop data, i.e. data from sen- data were essentially a scarce commodity with a high sors in roads, become available (Struijs and Daas, price. Before the era of Big Data, information was not 2013). readily available but had to be collected for a particular However, in order to realise these opportunities, a purpose. Official statistical information based on survey number of challenges have to be overcome, which are data had a unique value: there simply was no alterna- generally applicable to all uses of Big Data as an infor- tive. For example, population census data, collected mation source and as such are not unique to NSIs. door to door, was immensely valuable to policy- makers, researchers and other users. In the last few Challenges and issues decades, data collected by public administrations have become increasingly accessible for statistical purposes, Some of the biggest challenges that statisticians face in stimulated in part by IT developments. Statistical data their use of Big Data concern methodology. Many Big collection by means of questionnaires was Data sources, such as social media messages, are Struijs et al. 3 composed of observational data and are not deliber- has a direct impact on trust in official statistics. These ately designed for data analysis, and thus do not have concerns have been heightened by the revelations that a well-defined target population, structure and quality. intelligence agencies are among the most active Big This makes it difficult to apply traditional statistical Data users. For NSIs, it is critical that these concerns methods, based on sampling theory (Daas and Puts, be addressed through practices such as being transpar- 2014a). The unstructured nature of many Big Data ent about what and how Big Data sources are used. sources makes it even more difficult to extract meaning- Other mechanisms could also be developed. For exam- ful statistical information. For many Big Data sources, ple, in some cases it might be feasible to adopt informed the interpretation of the data and its relationship with consent approaches. Some mobile phone subscription social phenomena of interest is far from obvious. For contracts, for instance, offer an opt out to the sub- example, public Facebook messages in the Netherlands scriber for using their data for other purposes than clearly reflect general sentiment in some sense, but it is providing the phone service. If the opt out rate is not far from clear how exactly (Daas and Puts, 2014b). too high, this does not seriously affect the usability of Moreover, if such data are to be used as a source for mobile phone data for statistical purposes. a population sentiment indicator, one would like to Another obvious challenge is the processing, storage know the relationship between the population of per- and transfer of large data sets. Technological advances sons writing public messages on Facebook and the like increases in computing power, larger storage facil- population at large. This is challenging without falling ities and high bandwidth data channels may partly back to surveys. Furthermore, the population of per- solve these issues. Having data processed at the sons using social media is likely to change over time, source, thus preventing the transfer of large data sets making a comparison to the population at large even and the duplication of storage, may also be considered. more challenging. These technological challenges include mechanisms for For NSIs, a key question concerns how the quality ensuring the security of data, which is of the utmost of official statistics can be guaranteed if they are based importance because of privacy and confidentiality con- on Big Data. To address this, new methodologies and cerns and makes, for example, cheap cloud-based solu- forms of interpretation need to be developed. Take for tions less attractive. example mobile phones. If data from mobile phone Another issue is the possible volatility of Big Data sources, given the fact that official statistics often take providers are used for statistics on, say, population mobility, the statistician has to interpret anonymised the form of time series analyses. For many users, the detailed call records from individual phones and continuity of these series is of the utmost importance. derive information about the behaviour of the people Still another issue is the skills required for dealing with using them. That means dealing with the fact that mea- Big Data. Modern data scientists may be better sureable phone activity may vary during the day, some equipped than traditionally trained statisticians. persons may have multiple mobile phones or none, chil- Probably more important is the need for a different dren carry mobile phones which are registered to their mind-set as the use of Big Data may imply a paradigm parents, phones may be switched off, etc. For social shift, including an increased and modified use of mod- media, even more questions arise such as who is the elling techniques (Daas and Puts, 2014a; Struijs and author of a message. While some methodological reme- Daas, 2013). dies have already been developed to some extent, such as deriving the gender and age of a social media user by Collaboration the known correlation between sex, age and choice of words, these still pose a challenge, as explained above. Faced with these challenges, NSIs have recognised the Privacy and legal issues form another challenge. The necessity of not working in isolation but collaborating prevention of the disclosure of the identity of individ- with each other and others outside the community of uals is an imperative, but this is difficult to guarantee official statistics. This collaboration is often exploratory when dealing with Big Data. Since legislation typically and may be aimed at sharing knowledge and experi- lags behind the emergence of new social phenomena, ences, but there are already examples of collaboration the legal situation for cases involving Big Data is not that go further. always clear. In such cases, one may have to fall back From the perspective of NSIs, several types of part- on ethical standards to decide on whether and how to ners are of interest. First of all, the potential providers use Big Data. Other legal issues relate to copyright and of Big Data are essential partners: if they do not grant the ownership of data. Even if data may legally be used, access to their data, the story is over before it starts. this does not imply that it is wise or appropriate to do Data owners have their own concerns and, like NSIs, so. Of critical importance is the implication of any use they are subject to privacy rules. This may complicate of Big Data for the public perception of an NSI as this collaboration even if they have a positive outlook and 4 Big Data & Society approach. But since Big Data sources are not designed For some time already, Big Data has been an for statistical use, such collaboration is also essential in important topic for the UNECE, the United Nations order to obtain good knowledge of the provenance of Economic Commission for Europe. Collaboration at such sources. Additionally, for statistical production, it that level resulted in an overview paper about the may be more efficient to have data processed at the site implications of Big Data for official statistics of collection and storage. In such cases, the assumption (UNECE, 2013a). Seminars have been held, facilitat- that data can be provided for free may no longer hold. ing the exchange of knowledge, for instance, on stat- On the other hand, statisticians also have much to offer istical data collection. In 2014, the UNECE went one such as providing analytic insights that may help data step further in facilitating cross-national work through owners understand their data better. Doing complex a project with the following stated objectives: statistical analyses is core business for NSIs, but not for, say, a mobile phone company. In these and other a. to identify, examine and provide guidance for stat- ways, the relationship with data providers could poten- istical organisations to act upon the main strategic tially become true partnerships. For example, one spe- and methodological issues that Big Data poses for cific role that NSIs could play is that of a trusted third the official statistics industry; party. In a competitive market, competitors will be b. to demonstrate the feasibility of efficient production reluctant to share sensitive data among each other. of both novel products and ‘mainstream’ official But they might be willing to share it with an NSI statistics using Big Data sources, and the possibility who compiles statistical information that is beneficial to replicate these approaches across different to all. national contexts; Collaboration between NSIs and academia may c. to facilitate the sharing across organisations of grow as well. Universities have historically been natural knowledge, expertise, tools and methods for the partners for NSIs. It stands to reason that such collab- production of statistics using Big Data sources oration will extend to the field of Big Data, for instance, (UNECE, 2013b). in solving methodological problems, developing tech- nical solutions and training future data scientists. Such collaboration is also being supported by public The future of official statistics funders who are facilitating research and innovation partnerships through targeted grants. By working in What does the advent of Big Data mean for official partnership, researchers in universities and NSIs could statistics? As we have argued, it provides many oppor- better leverage such opportunities. tunities. But in order to make optimal use of Big Furthermore, there are many commercial partners Data, a number of issues have to be addressed. This with which NSIs could collaborate. Google and calls for increased collaboration with private and aca- Facebook are two examples for which Big Data forms demic partners who have access to specific Big Data the core of their business model. Their knowledge and sources and knowledge, but also between NSIs. The the data to which they have access may be very relevant relationship between the various stakeholders will to NSIs. IT companies also possess relevant knowledge involve each partner building on and contributing dif- on Big Data processing and storage, security, cloud pro- ferent strengths and will likely result in flexible net- cessing, etc. Apart from the provision of paid services, works. Such networks are flexible in the sense that collaboration may be of interest to them with a view to membership of the network and the contribution of obtaining statistical expertise and for benchmarking or partners depend on actual needs instead of being validating their information products. fixed in advance for a long time. Collaboration between NSIs in the field of Big Data Seen from the viewpoint of NSIs, there are also has already started. Big Data has become a prominent potential risks. Official statistics are facing more com- subject at many statistical meetings and conferences in petition. In a time of growing data abundance, generat- Europe, such as the 2013 New Techniques and ing statistical information that is potentially relevant to Technologies for Statistics (NTTS) conference, a sci- society is no longer an activity intrinsically restricted to entific conference organised by Eurostat, and the ‘ESS NSIs. And even the traditional advantage of NSIs, Big Data event 2014’ in Rome. The directors-general being legally allowed to collect data and combine of all European NSIs met in Scheveningen in data sources, is eroding. It may not be possible to com- September 2013 to learn about Big Data and adopted bine survey data and administrative data with Big Data the Scheveningen Memorandum (DGINS, 2013). This sources at the micro-level, which reduces the relative memorandum calls for an international strategic disadvantage traditionally faced by the competition. approach to Big Data and plans for the adoption of For some statistics, Big Data sources cannot be eas- an action plan and roadmap by mid-2014. ily envisaged as alternatives to more traditional Struijs et al. 5 Statistics Netherlands. The authors wish to thank the editors data sources. This certainly holds for official figures on for their valuable suggestions for improvements. government finance and economic growth, which are heavily used for decision-making at both the national Declaration of conflicting interest and international level. But, given the increasing com- petition that data generated by other sources is present- The authors declare that there is no conflict of interest. ing to the role of NSIs as bearers of official statistics, a strategic reassessment is needed. This could include Funding fundamental questions such as whether statistics This research received no specific grant from any based on Big Data sources should be a core activity funding agency in the public, commercial, or not-for-profit of NSIs, or if some data and information should be sectors. provided by other market actors, or if NSIs can or should provide new services in this context. Notes But by posing these questions, we return to the basic 1. http://epp.eurostat.ec.europa.eu/portal/page/portal/ premise that society’s access to impartial statistical pgp_ess/ess/ess_news information must be maintained at all times, either by 2. http://bpp.mit.edu/ NSIs or other parties. In choosing a position, NSIs 3. The current EU framework programme for research and innovation, Horizon 2020, is an example (European could build on and promote their strengths and Commission, 2013), which mentions Big Data unique position. Especially at a time of competing specifically. and multiplying data sources, their impartiality and 4. http://www.cros-portal.eu/content/ntts-2013 respect for privacy as enshrined in law uniquely pos- 5. http://www.cros-portal.eu/content/big-data-event-2014 ition them as a trusted third party. They also have 6. http://www.unece.org/stats/documents/2013.09.coll.html unique knowledge of official statistical production methods. Finally, they continue to have privileged References access to government data sources that provide Daas PJH and Puts MJH (2014a) Big Data as a source of unique information and knowledge and have the statistical information. The Survey Statistician 69: 22–31. authority to collect data for statistical purposes that Available at: http://isi.cbs.nl/iass/N69.pdf (accessed 22 because of privacy considerations will never be avail- May 2014). able to businesses. Daas PJH and Puts MJH (2014b) Social media sentiment and As a consequence, in the context of the challenges of consumer confidence. Paper for the workshop on using Big Big Data sources, NSIs will remain important providers Data for forecasting and statistics, Frankfurt, Germany, 7– of official statistics. And where other organisations are 8 April. Available at: http://www.ecb.europa.eu/events/ able to provide statistical information to the public, pdf/conferences/140407/Daas_Puts_Sociale_media_ rather than competing, NSIs could build on their pos- cons_conf_Stat_Neth.pdf?409d61b733fc259971ee5beec7 cedc61 (accessed 22 May 2014). ition as an impartial, trusted third party and their Daas PJH, Puts MJ, Buelens B, et al. (2013) Big Data and expertise to advise on the quality and validity of infor- official statistics. Paper for the 2013 NTTS mation of these various sources. Possibly, then, pro- conference, Brussels, Belgium, 5–7 March. Available at: viders of Big Data may even seek validation of their http://www.cros-portal.eu/sites/default/files/ data from NSIs, thereby opening up yet another possi- NTTS2013fullPaper_76.pdf (accessed 22 May 2014). bility for new partnerships. De Jonge E, Van Pelt M and Roos M (2012) Time patterns, The future of official statistics in the age of Big Data geospatial clustering and mobility statistics based on mobile is still a matter of some deliberation and experimenta- phone network data. Discussion paper 201214, Statistics tion. But what is clear already is that the international Netherlands. Available at: http://www.cbs.nl/NR/rdonlyres/ statistical community needs to adapt to a new reality 010F11EC-AF2F-4138-8201-2583D461D2B6/0/201214x and respond to the opportunities and challenges it pro- 10pub.pdf (accessed 22 May 2014). DGINS (2013) Scheveningen memorandum on Big Data vides. To do so calls for greater collaboration with and official statistics. Available at: http://epp.eurostat. players inside and outside the statistical community, ec.europa.eu/portal/page/portal/pgp_ess/0_DOCS/estat/ through the formation of flexible networks that can SCHEVENINGEN_MEMORANDUM%20Final% forge new ways of generating statistical data. For all 20version.pdf (accessed 22 May 2014). engaged with statistics, we think the Big Data era is a European Commission (2013) Horizon 2020, the EU most exciting time. framework programme for research and innovation. Available at: http://ec.europa.eu/programmes/horizon 2020/ (accessed 22 May 2014). Acknowledgements European Union (2013) Directive 2013/37/EU of the The views expressed in this contribution are those of the European Parliament and of the Council of 26 June authors and do not necessarily reflect the position of 2013, amending Directive 2003/98/EC on the re-use of 6 Big Data & Society public sector information. Available at: http://eur-lex. UNECE (2013a) What does ‘Big Data’ mean for official stat- europa.eu/LexUriServ/LexUriServ.do?uri¼OJ:L:2013:175: istics? Paper prepared on behalf of the high-level group for 0001:0008:EN:PDF (accessed 22 May 2014). the modernisation of statistical production and services,10 Eurostat (2014) European statistics code of practice. March. Available at: http://www1.unece.org/stat/plat- Available at: http://epp.eurostat.ec.europa.eu/portal/ form/pages/viewpage.action?pageId¼77170614 (accessed page/portal/quality/code_of_practice (accessed 22 May 22 May 2014). 2014). UNECE (2013b) The role of Big Data in the modernisation of Mayer-Schonberger V and Cukier K (2013) Big Data: A statistical production. Project plan. Available at: http:// Revolution that Will Transform How We Live, Work, and www1.unece.org/stat/platform/display/msis/Finalþprojectþ Think. London: John Murray Publishers. proposal%3AþTheþRoleþofþBigþDataþinþtheþ Struijs P and Daas PJH (2013) Big Data, big impact? Paper ModernisationþofþStatisticalþProduction (accessed presented at the seminar on statistical data collection, 22 May 2014). Geneva, Switzerland, 25–27 September 2013. Available United Nations (2013) Fundamental principles of official at: http://www.unece.org/fileadmin/DAM/stats/documents/ statistics. Available at: http://unstats.un.org/unsd/dnss/ ece/ces/ge.44/2013/mgt1/WP31.pdf (accessed 22 May 2014). gp/FP-New-E.pdf (accessed 22 May 2014). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Big Data & Society SAGE

Official statistics and Big Data:

Loading next page...
 
/lp/sage/official-statistics-and-big-data-BX0IZj079m
Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Ltd, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
2053-9517
eISSN
2053-9517
DOI
10.1177/2053951714538417
Publisher site
See Article on Publisher Site

Abstract

The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society. Keywords Big Data, official statistics, European Statistical System Professional standards play a vital role in securing Introduction trust in official statistics. Statisticians have their own The advent of Big Data is expected to have a big impact ethics code (United Nations, 2013), which includes an on organisations for which the production and analysis absolute respect for the confidentiality of data provided of data and information is core business. National by respondents. Data collected for statistical purposes Statistical Institutes (NSIs) are such organisations. may never be disclosed and may never be used for other They are responsible for official statistics, which are purposes. At the level of the European Union (EU), heavily used by policy-makers and other important quality norms have been codified in the so-called players in society. Arguably, the way NSIs take up Statistics Code of Practice (Eurostat, 2014). The trust Big Data will eventually have implications for all of earned by respecting professional standards is also the society. basis for a privileged position of NSIs in respect of data Official statistics play a key role in modern society. acquisition. Many NSIs have access by law to govern- NSIs aim at providing information on all important ment data sources and have the power to collect data aspects of society in an impartial way, and according from other parties, often without having to pay the to the highest scientific standards. Information that ful- provider. Moreover, for statistical purposes, many fils these demands is used in public discussion, forms NSIs are allowed to link data from different sources. the basis of policy decisions, is required for business use, feeds scientific research, is used in education and Statistics Netherlands, Heerlen, The Netherlands so on. Official statistics can only meet this demand if they can be trusted. In advanced societies, official stat- Corresponding author: istics are often taken for granted, but where trust is Peter Struijs, Statistics Netherlands, P.O. Box 4481, Heerlen, 6401 CZ, lacking, society misses an important pillar for informed The Netherlands. discussion and evidence-based policy-making. Email: p.struijs@cbs.nl Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (http://www.uk.sagepub.com/aboutus/ openaccess.htm). 2 Big Data & Society Given this role for NSIs, what does the emergence of supplemented and increasingly replaced by administra- Big Data mean for official statistics? This question is tive data sources. Nowadays, some countries do not addressed in this contribution, but as we will see, there conduct extensive population surveys anymore but are many reasons why the role of NSIs in the Big Data compile census statistics by combining and analysing era is not ‘given’. In order to keep a sound and trusted data from several administrative sources. NSIs basis of information for society to rely on, we argue became more integrated in the information architecture that NSIs may have to adapt to the changing context of the government. In this way, the burden on persons in which they operate. and businesses to respond to questionnaires was con- siderably reduced. In the context of all of these developments, the infor- Official statistics in a changing context mation provided by NSIs still remained unique. In par- In respect of information, society is changing rapidly. ticular, the possibility of combining data from different For example, there is an enormous growth of data that sources made official statistics even more valuable, is gathered and recorded in myriad ways: from satellite since in many countries no other organisation was pos- and sensory data, to social network and transactional itioned to do so. In parallel, efforts also increased to data and so on. The availability of data is also expand- standardise and harmonise these various sources of offi- ing and becoming the foundation of business models. cial statistics, especially in the EU. Supported by legis- Information is becoming more visual and interactive. lation, official statistics in the EU are now considered a Information and communication technology is becom- system, the so-called European Statistical System, ing ever more advanced, processing power and data or ESS. storage capacity is continuously rising, cloud solutions However, Big Data is changing the environment of are emerging and applications are becoming more intel- the NSIs once more as data scarcity is becoming less of ligent. These developments have been described in more an issue. For NSIs, there are potential benefits as new depth and detail by many observers, such as Mayer- data sources and opportunities emerge. But it also Scho¨ nberger and Cukier (2013). makes the products of NSIs potentially less unique, These changes have many impacts on societies. For since other players in the information market may one, the increased gathering of data and the commer- start – and have actually started – producing statistics, cial and social possibilities of data usage influence for instance, on inflation, such as the Billion Prices public opinion on privacy. Some are concerned if Project of MIT. their data are re-used without their consent, for com- Let us first look at the opportunities for NSIs offered mercial reasons or otherwise. Others do not mind so by Big Data. There is a huge potential for new statistics much, if this means that services are provided for (Daas et al., 2013). Location data for mobile phones free. Many people voluntarily share information on could be used for almost instantaneous daytime popu- social networks without caring for privacy. People lation and tourism statistics (De Jonge et al., 2012). have less patience to fill in questionnaires, especially if Social media messages could be used for several types the data requested have been registered somewhere else of indicators, such as an early indicator of consumer already. Government agencies are expected to be more confidence. Inflation figures could be derived from price forthcoming in providing data. Governments have information on the web, and so on. In addition, Big reacted to the changes by formulating policies on, for Data sources may be used to substitute or supplement instance, open data and availability of public sector more traditional data sources, such as questionnaire information, also at the EU level (European Union, and administrative data. For instance, data collection 2013). by questionnaire on road use may not be necessary How have NSIs responded? Until around the 1980s, anymore if detailed traffic loop data, i.e. data from sen- data were essentially a scarce commodity with a high sors in roads, become available (Struijs and Daas, price. Before the era of Big Data, information was not 2013). readily available but had to be collected for a particular However, in order to realise these opportunities, a purpose. Official statistical information based on survey number of challenges have to be overcome, which are data had a unique value: there simply was no alterna- generally applicable to all uses of Big Data as an infor- tive. For example, population census data, collected mation source and as such are not unique to NSIs. door to door, was immensely valuable to policy- makers, researchers and other users. In the last few Challenges and issues decades, data collected by public administrations have become increasingly accessible for statistical purposes, Some of the biggest challenges that statisticians face in stimulated in part by IT developments. Statistical data their use of Big Data concern methodology. Many Big collection by means of questionnaires was Data sources, such as social media messages, are Struijs et al. 3 composed of observational data and are not deliber- has a direct impact on trust in official statistics. These ately designed for data analysis, and thus do not have concerns have been heightened by the revelations that a well-defined target population, structure and quality. intelligence agencies are among the most active Big This makes it difficult to apply traditional statistical Data users. For NSIs, it is critical that these concerns methods, based on sampling theory (Daas and Puts, be addressed through practices such as being transpar- 2014a). The unstructured nature of many Big Data ent about what and how Big Data sources are used. sources makes it even more difficult to extract meaning- Other mechanisms could also be developed. For exam- ful statistical information. For many Big Data sources, ple, in some cases it might be feasible to adopt informed the interpretation of the data and its relationship with consent approaches. Some mobile phone subscription social phenomena of interest is far from obvious. For contracts, for instance, offer an opt out to the sub- example, public Facebook messages in the Netherlands scriber for using their data for other purposes than clearly reflect general sentiment in some sense, but it is providing the phone service. If the opt out rate is not far from clear how exactly (Daas and Puts, 2014b). too high, this does not seriously affect the usability of Moreover, if such data are to be used as a source for mobile phone data for statistical purposes. a population sentiment indicator, one would like to Another obvious challenge is the processing, storage know the relationship between the population of per- and transfer of large data sets. Technological advances sons writing public messages on Facebook and the like increases in computing power, larger storage facil- population at large. This is challenging without falling ities and high bandwidth data channels may partly back to surveys. Furthermore, the population of per- solve these issues. Having data processed at the sons using social media is likely to change over time, source, thus preventing the transfer of large data sets making a comparison to the population at large even and the duplication of storage, may also be considered. more challenging. These technological challenges include mechanisms for For NSIs, a key question concerns how the quality ensuring the security of data, which is of the utmost of official statistics can be guaranteed if they are based importance because of privacy and confidentiality con- on Big Data. To address this, new methodologies and cerns and makes, for example, cheap cloud-based solu- forms of interpretation need to be developed. Take for tions less attractive. example mobile phones. If data from mobile phone Another issue is the possible volatility of Big Data sources, given the fact that official statistics often take providers are used for statistics on, say, population mobility, the statistician has to interpret anonymised the form of time series analyses. For many users, the detailed call records from individual phones and continuity of these series is of the utmost importance. derive information about the behaviour of the people Still another issue is the skills required for dealing with using them. That means dealing with the fact that mea- Big Data. Modern data scientists may be better sureable phone activity may vary during the day, some equipped than traditionally trained statisticians. persons may have multiple mobile phones or none, chil- Probably more important is the need for a different dren carry mobile phones which are registered to their mind-set as the use of Big Data may imply a paradigm parents, phones may be switched off, etc. For social shift, including an increased and modified use of mod- media, even more questions arise such as who is the elling techniques (Daas and Puts, 2014a; Struijs and author of a message. While some methodological reme- Daas, 2013). dies have already been developed to some extent, such as deriving the gender and age of a social media user by Collaboration the known correlation between sex, age and choice of words, these still pose a challenge, as explained above. Faced with these challenges, NSIs have recognised the Privacy and legal issues form another challenge. The necessity of not working in isolation but collaborating prevention of the disclosure of the identity of individ- with each other and others outside the community of uals is an imperative, but this is difficult to guarantee official statistics. This collaboration is often exploratory when dealing with Big Data. Since legislation typically and may be aimed at sharing knowledge and experi- lags behind the emergence of new social phenomena, ences, but there are already examples of collaboration the legal situation for cases involving Big Data is not that go further. always clear. In such cases, one may have to fall back From the perspective of NSIs, several types of part- on ethical standards to decide on whether and how to ners are of interest. First of all, the potential providers use Big Data. Other legal issues relate to copyright and of Big Data are essential partners: if they do not grant the ownership of data. Even if data may legally be used, access to their data, the story is over before it starts. this does not imply that it is wise or appropriate to do Data owners have their own concerns and, like NSIs, so. Of critical importance is the implication of any use they are subject to privacy rules. This may complicate of Big Data for the public perception of an NSI as this collaboration even if they have a positive outlook and 4 Big Data & Society approach. But since Big Data sources are not designed For some time already, Big Data has been an for statistical use, such collaboration is also essential in important topic for the UNECE, the United Nations order to obtain good knowledge of the provenance of Economic Commission for Europe. Collaboration at such sources. Additionally, for statistical production, it that level resulted in an overview paper about the may be more efficient to have data processed at the site implications of Big Data for official statistics of collection and storage. In such cases, the assumption (UNECE, 2013a). Seminars have been held, facilitat- that data can be provided for free may no longer hold. ing the exchange of knowledge, for instance, on stat- On the other hand, statisticians also have much to offer istical data collection. In 2014, the UNECE went one such as providing analytic insights that may help data step further in facilitating cross-national work through owners understand their data better. Doing complex a project with the following stated objectives: statistical analyses is core business for NSIs, but not for, say, a mobile phone company. In these and other a. to identify, examine and provide guidance for stat- ways, the relationship with data providers could poten- istical organisations to act upon the main strategic tially become true partnerships. For example, one spe- and methodological issues that Big Data poses for cific role that NSIs could play is that of a trusted third the official statistics industry; party. In a competitive market, competitors will be b. to demonstrate the feasibility of efficient production reluctant to share sensitive data among each other. of both novel products and ‘mainstream’ official But they might be willing to share it with an NSI statistics using Big Data sources, and the possibility who compiles statistical information that is beneficial to replicate these approaches across different to all. national contexts; Collaboration between NSIs and academia may c. to facilitate the sharing across organisations of grow as well. Universities have historically been natural knowledge, expertise, tools and methods for the partners for NSIs. It stands to reason that such collab- production of statistics using Big Data sources oration will extend to the field of Big Data, for instance, (UNECE, 2013b). in solving methodological problems, developing tech- nical solutions and training future data scientists. Such collaboration is also being supported by public The future of official statistics funders who are facilitating research and innovation partnerships through targeted grants. By working in What does the advent of Big Data mean for official partnership, researchers in universities and NSIs could statistics? As we have argued, it provides many oppor- better leverage such opportunities. tunities. But in order to make optimal use of Big Furthermore, there are many commercial partners Data, a number of issues have to be addressed. This with which NSIs could collaborate. Google and calls for increased collaboration with private and aca- Facebook are two examples for which Big Data forms demic partners who have access to specific Big Data the core of their business model. Their knowledge and sources and knowledge, but also between NSIs. The the data to which they have access may be very relevant relationship between the various stakeholders will to NSIs. IT companies also possess relevant knowledge involve each partner building on and contributing dif- on Big Data processing and storage, security, cloud pro- ferent strengths and will likely result in flexible net- cessing, etc. Apart from the provision of paid services, works. Such networks are flexible in the sense that collaboration may be of interest to them with a view to membership of the network and the contribution of obtaining statistical expertise and for benchmarking or partners depend on actual needs instead of being validating their information products. fixed in advance for a long time. Collaboration between NSIs in the field of Big Data Seen from the viewpoint of NSIs, there are also has already started. Big Data has become a prominent potential risks. Official statistics are facing more com- subject at many statistical meetings and conferences in petition. In a time of growing data abundance, generat- Europe, such as the 2013 New Techniques and ing statistical information that is potentially relevant to Technologies for Statistics (NTTS) conference, a sci- society is no longer an activity intrinsically restricted to entific conference organised by Eurostat, and the ‘ESS NSIs. And even the traditional advantage of NSIs, Big Data event 2014’ in Rome. The directors-general being legally allowed to collect data and combine of all European NSIs met in Scheveningen in data sources, is eroding. It may not be possible to com- September 2013 to learn about Big Data and adopted bine survey data and administrative data with Big Data the Scheveningen Memorandum (DGINS, 2013). This sources at the micro-level, which reduces the relative memorandum calls for an international strategic disadvantage traditionally faced by the competition. approach to Big Data and plans for the adoption of For some statistics, Big Data sources cannot be eas- an action plan and roadmap by mid-2014. ily envisaged as alternatives to more traditional Struijs et al. 5 Statistics Netherlands. The authors wish to thank the editors data sources. This certainly holds for official figures on for their valuable suggestions for improvements. government finance and economic growth, which are heavily used for decision-making at both the national Declaration of conflicting interest and international level. But, given the increasing com- petition that data generated by other sources is present- The authors declare that there is no conflict of interest. ing to the role of NSIs as bearers of official statistics, a strategic reassessment is needed. This could include Funding fundamental questions such as whether statistics This research received no specific grant from any based on Big Data sources should be a core activity funding agency in the public, commercial, or not-for-profit of NSIs, or if some data and information should be sectors. provided by other market actors, or if NSIs can or should provide new services in this context. Notes But by posing these questions, we return to the basic 1. http://epp.eurostat.ec.europa.eu/portal/page/portal/ premise that society’s access to impartial statistical pgp_ess/ess/ess_news information must be maintained at all times, either by 2. http://bpp.mit.edu/ NSIs or other parties. In choosing a position, NSIs 3. The current EU framework programme for research and innovation, Horizon 2020, is an example (European could build on and promote their strengths and Commission, 2013), which mentions Big Data unique position. Especially at a time of competing specifically. and multiplying data sources, their impartiality and 4. http://www.cros-portal.eu/content/ntts-2013 respect for privacy as enshrined in law uniquely pos- 5. http://www.cros-portal.eu/content/big-data-event-2014 ition them as a trusted third party. They also have 6. http://www.unece.org/stats/documents/2013.09.coll.html unique knowledge of official statistical production methods. Finally, they continue to have privileged References access to government data sources that provide Daas PJH and Puts MJH (2014a) Big Data as a source of unique information and knowledge and have the statistical information. The Survey Statistician 69: 22–31. authority to collect data for statistical purposes that Available at: http://isi.cbs.nl/iass/N69.pdf (accessed 22 because of privacy considerations will never be avail- May 2014). able to businesses. Daas PJH and Puts MJH (2014b) Social media sentiment and As a consequence, in the context of the challenges of consumer confidence. Paper for the workshop on using Big Big Data sources, NSIs will remain important providers Data for forecasting and statistics, Frankfurt, Germany, 7– of official statistics. And where other organisations are 8 April. Available at: http://www.ecb.europa.eu/events/ able to provide statistical information to the public, pdf/conferences/140407/Daas_Puts_Sociale_media_ rather than competing, NSIs could build on their pos- cons_conf_Stat_Neth.pdf?409d61b733fc259971ee5beec7 cedc61 (accessed 22 May 2014). ition as an impartial, trusted third party and their Daas PJH, Puts MJ, Buelens B, et al. (2013) Big Data and expertise to advise on the quality and validity of infor- official statistics. Paper for the 2013 NTTS mation of these various sources. Possibly, then, pro- conference, Brussels, Belgium, 5–7 March. Available at: viders of Big Data may even seek validation of their http://www.cros-portal.eu/sites/default/files/ data from NSIs, thereby opening up yet another possi- NTTS2013fullPaper_76.pdf (accessed 22 May 2014). bility for new partnerships. De Jonge E, Van Pelt M and Roos M (2012) Time patterns, The future of official statistics in the age of Big Data geospatial clustering and mobility statistics based on mobile is still a matter of some deliberation and experimenta- phone network data. Discussion paper 201214, Statistics tion. But what is clear already is that the international Netherlands. Available at: http://www.cbs.nl/NR/rdonlyres/ statistical community needs to adapt to a new reality 010F11EC-AF2F-4138-8201-2583D461D2B6/0/201214x and respond to the opportunities and challenges it pro- 10pub.pdf (accessed 22 May 2014). DGINS (2013) Scheveningen memorandum on Big Data vides. To do so calls for greater collaboration with and official statistics. Available at: http://epp.eurostat. players inside and outside the statistical community, ec.europa.eu/portal/page/portal/pgp_ess/0_DOCS/estat/ through the formation of flexible networks that can SCHEVENINGEN_MEMORANDUM%20Final% forge new ways of generating statistical data. For all 20version.pdf (accessed 22 May 2014). engaged with statistics, we think the Big Data era is a European Commission (2013) Horizon 2020, the EU most exciting time. framework programme for research and innovation. Available at: http://ec.europa.eu/programmes/horizon 2020/ (accessed 22 May 2014). Acknowledgements European Union (2013) Directive 2013/37/EU of the The views expressed in this contribution are those of the European Parliament and of the Council of 26 June authors and do not necessarily reflect the position of 2013, amending Directive 2003/98/EC on the re-use of 6 Big Data & Society public sector information. Available at: http://eur-lex. UNECE (2013a) What does ‘Big Data’ mean for official stat- europa.eu/LexUriServ/LexUriServ.do?uri¼OJ:L:2013:175: istics? Paper prepared on behalf of the high-level group for 0001:0008:EN:PDF (accessed 22 May 2014). the modernisation of statistical production and services,10 Eurostat (2014) European statistics code of practice. March. Available at: http://www1.unece.org/stat/plat- Available at: http://epp.eurostat.ec.europa.eu/portal/ form/pages/viewpage.action?pageId¼77170614 (accessed page/portal/quality/code_of_practice (accessed 22 May 22 May 2014). 2014). UNECE (2013b) The role of Big Data in the modernisation of Mayer-Schonberger V and Cukier K (2013) Big Data: A statistical production. Project plan. Available at: http:// Revolution that Will Transform How We Live, Work, and www1.unece.org/stat/platform/display/msis/Finalþprojectþ Think. London: John Murray Publishers. proposal%3AþTheþRoleþofþBigþDataþinþtheþ Struijs P and Daas PJH (2013) Big Data, big impact? Paper ModernisationþofþStatisticalþProduction (accessed presented at the seminar on statistical data collection, 22 May 2014). Geneva, Switzerland, 25–27 September 2013. Available United Nations (2013) Fundamental principles of official at: http://www.unece.org/fileadmin/DAM/stats/documents/ statistics. Available at: http://unstats.un.org/unsd/dnss/ ece/ces/ge.44/2013/mgt1/WP31.pdf (accessed 22 May 2014). gp/FP-New-E.pdf (accessed 22 May 2014).

Journal

Big Data & SocietySAGE

Published: Apr 1, 2014

Keywords: Big Data; official statistics; European Statistical System

There are no references for this article.