Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Comparing the Quality of Crowdsourced Data Contributed by Expert and Non-Experts

Comparing the Quality of Crowdsourced Data Contributed by Expert and Non-Experts There is currently a lack of in-situ environmental data for the calibration and validation of remotely sensed products and for the development and verification of models. Crowdsourcing is increasingly being seen as one potentially powerful way of increasing the supply of in-situ data but there are a number of concerns over the subsequent use of the data, in particular over data quality. This paper examined crowdsourced data from the Geo-Wiki crowdsourcing tool for land cover validation to determine whether there were significant differences in quality between the answers provided by experts and non- experts in the domain of remote sensing and therefore the extent to which crowdsourced data describing human impact and land cover can be used in further scientific research. The results showed that there was little difference between experts and non-experts in identifying human impact although results varied by land cover while experts were better than non- experts in identifying the land cover type. This suggests the need to create training materials with more examples in those areas where difficulties in identification were encountered, and to offer some method for contributors to reflect on the information they contribute, perhaps by feeding back the evaluations of their contributed data or by making additional training materials available. Accuracies were also found to be higher when the volunteers were more consistent in their responses at a given location and when they indicated higher confidence, which suggests that these additional pieces of information could be used in the development of robust measures of quality in the future. Citation: See L, Comber A, Salk C, Fritz S, van der Velde M, et al. (2013) Comparing the Quality of Crowdsourced Data Contributed by Expert and Non- Experts. PLoS ONE 8(7): e69958. doi:10.1371/journal.pone.0069958 Editor: Tobias Preis, University of Warwick, United Kingdom Received February 14, 2013; Accepted June 12, 2013; Published July 31, 2013 Copyright:  2013 See et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funding from the Austrian Agency for the Promotion of Science via the project LandSpotting (No. 828332). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: see@iiasa.ac.at advantages of VGI is the potential increase in the volumes of Introduction data about all kinds of spatially referenced phenomena. Such data The proliferation of Web2.0 technology over the last decade has can be collated and used for many different scientific activities: resulted in changes in the way that data are created. Individual from the calibration of scientific models (e.g. economic prediction citizens now provide vast amounts of information to websites and models that require information about land use) to the validation online databases, much of which is spatially referenced. The of existent data (e.g. maps derived through Earth Observation). analysis and exploitation of this georeferenced subset of crowd- With improved connectivity via mobile phones and the use of sourced data, or what is more commonly referred to as low cost, ubiquitous sensors (e.g. those which directly and volunteered geographic information (VGI) [1,2], has the potential instantaneously capture data about their immediate environment), to fundamentally change the nature of scientific investigation. the opportunities to exploit such rich veins of VGI are many and Citizens have a long history of being involved in scientific research varied. However, whilst one of the pressing challenges concerns or the more recently coined ‘citizen science’ [3]. There are many how to manage large data volumes in terms of processing and successful examples of citizen science that have led to new storage, a number of yet unaddressed issues persist. These include scientific discoveries, including unravelling protein structures [4] how to handle data privacy, how to ensure adequate security, and and discovering new galaxies [5], as well as websites for public critically, how to assess VGI data quality. Data quality is an area reporting of illegal logging/deforestation [6] and waste dumping that has attracted increasing attention in the literature [1,11–13]: [7], which have demonstrated how citizens can have a visible quantifying VGI data quality underpins its usefulness (that is, its impact upon the environment and local governance. Analysis of reliability and credibility) and potential for incorporation into more passive sources of geo-tagged data from the crowd from scientific analyses. The critical issue is whether ordinary citizens search engines such as Google has also revealed interesting can provide information that is of high enough quality to be used scientific trends, e.g. the relationship between GDP and searches in formal scientific investigations. about the future [8], trends in influenza [9] and the ability to With open access to high resolution satellite imagery through characterize crop planting dates [10]. One of the critical providers such as Google Earth and Bing Maps, it is possible to PLOS ONE | www.plosone.org 1 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 1. The spectrum of human impact. Human Impact Description 0% No evidence of any human activity visible 1 to 50% Some visible evidence of human activities such as tracks/roads; evidence of managed forests; some evidence of deforestation; some scattered human dwellings, some scattered agricultural fields; some evidence of grazing 51% to 80% Increasing density of agriculture from subsistence on the lower end to intensive, commercial agriculture with large field sizes on the upper end 81% to 99% Urban areas with decreasing amounts of green space and increasing density of housing 100% A built up urban area with no green space, typically the business district of a city doi:10.1371/journal.pone.0069958.t001 collect vast amounts of volunteered information about the Earth’s which VGI can be trusted as a source of training and validation surface such as land cover and land use. The collection of data in remote sensing. However, by investigating generic research crowdsourced land cover data is the main aim of the Geo-Wiki questions related to the quality and reliability of information project [14,15] in what is currently a contributory approach to contributed by citizens with different levels of domain expertise, citizen science [16]. Geo-Wiki is a web-based geospatial portal this research should also be of interest to the broader field of (http://www.geo-wiki.org) with an interface linked to Google citizen science. The next section describes data collection via the Earth. It can be used to visualize and validate global land cover human impact Geo-Wiki campaign and the analysis of volunteer datasets such as GLC-2000, MODIS and GlobCover [12] which and volunteered data quality. Following the results, some frequently disagree over the land cover they record at any given discussion is provided regarding the implications of incorporating location [17–19]. Since its inception, a number of Geo-Wiki VGI in scientific research including recommendations for further branches have been initiated, each one specifically devoted to research before conclusions are drawn in the final section. gathering different types of information such as agriculture (agriculture.geo-wiki.org), urban areas (cities.geo-wiki.org), bio- Materials and Methods mass (biomass.geo-wiki.org) and more recently human impact Data from the Human Impact Competition (humanimpact.geo-wiki.org). Crowdsourced data on land cover were collected using a branch The general aim of this paper is to determine whether there are of Geo-Wiki called Human Impact (http://humanimpact.geo- significant differences in quality in the information contributed by wiki.org) and the data were subsequently used to validate a map of experts and non-experts. This is explored through a land cover land availability for biofuel production [20]. The volunteers were case study with obvious implications for the domains of remote presented with pixel outlines of 1 km resolution (at the equator) sensing and landscape analyses and investigation of the extent to Figure 1. Number of pixels classified per day by the volunteers. These are daily totals from the start of the competition on day 1 to the end at just over 50 days, which shows a clear acceleration as the competition progressed. doi:10.1371/journal.pone.0069958.g001 PLOS ONE | www.plosone.org 2 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data projected onto Google Earth (where pixels in this context refer to with students, and through social media. Background information the smallest area for which information is collected) and were then on the competitors was collected through the registration asked to determine the percentage of human impact and the land procedure. The competition ran for just under 2 months in the cover type at each location from the following list: (1) Tree cover, autumn of 2011 [22]. The top ten volunteers were offered co- (2) Shrub cover, (3) Herbaceous vegetation/Grassland, (4) authorship on a paper resulting from the competition [20] as well Cultivated and managed, (5) Mosaic of cultivated and managed/ as Amazon vouchers as an incentive. Other incentives included natural vegetation, (6) Flooded/wetland, (7) Urban, (8) Snow and inviting friends, which resulted in extra points, a leader board so ice, (9) Barren and (10) Open Water. The concept of ‘human that competitors could gauge the competition, and appealing to impact’ was defined as the amount of evidence of human activity the environmental motivation of individuals through the biofuel visible in the Google Earth images. A spectrum of these intensities theme. is shown in Table 1, which is loosely based on the ideas of A set of 299 ‘control’ points was used to determine quality Theobald [21]. Volunteers were also asked to indicate their where three experts with backgrounds in physical geography, confidence in the class type and the impact score, whether they geospatial sciences, remote sensing and image classification agreed had used high resolution imagery and the date of the image. upon the land cover at each location. The first 99 control points Volunteers were recruited by emails sent to registered Geo-Wiki were provided to the volunteers at the start of the competition, the volunteers, relevant mailing lists and contacts, in particular those next 100 were provided three-quarters of the way through and the Figure 2. Global distribution of pixels collected by the volunteers. The distribution is shown by (a) human impact and (b) land cover type. doi:10.1371/journal.pone.0069958.g002 PLOS ONE | www.plosone.org 3 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 3. Median response time of the volunteers. The response time is in seconds measured from the start of the competition until the end at just over 50 days. doi:10.1371/journal.pone.0069958.g003 final 100 were provided at the end, where the latter were drawn following characteristics. Experts evaluated an average of 64.8 control data points each (s.d. 108.1) and non-experts 57.2 (s.d. from higher resolution imagery. The volunteers were then ranked by an index that combined quality and quantity through equal 95.1). Although there is the potential for a few individuals to have a disproportionately large impact on data quality and composition, weighting, and the top ten were declared the winners. Interest- in this case, of the 29 experts, 18 contributed more than 50 ingly, there were some minor changes in the top ten once quality evaluations, and of the 33 non-experts, 19 evaluated more than 50 was considered. data points. The volunteers’ demographics (age, gender, socio- A total of ,53,000 locations were validated by more than 60 economic status etc.) were not captured as part of the contributor individuals and Figure 1 shows the rapid increase in contributions registration. This is unfortunate, because although a proxy for in the last 20 days of the competition, with a particularly large previous experience is evaluated in this paper, it is well recognised spike at the end. Figure 2 illustrates the spatial distribution of the that such factors can influence contributor responses. Such data ,53,000 points collected expressed as measures of human impact will be collected in future campaigns. and land cover. Note that the crowdsourced data can be freely downloaded from http://www.geo-wiki.org. Analysis of Human Impact Of these ,53,000 validations, 7657 were at the control To determine how well the answers provided by the volunteers locations, which were then used to assess quality. The data were matched the control data in terms of the degree of human impact, then filtered for ‘unknown’ expertise resulting in 4020 control data a linear regression was fit as follows: points scored by 29 Expert volunteers and 3548 control data points scored by 33 Non-expert volunteers. Experts were considered to be individuals with a background in remote Y ~azbX ze ð1Þ i i i sensing/spatial sciences versus non-experts who were new to this discipline or had some self-declared limited background. The where Y is the degree of human impact from the control data, X is i i control data, whose analysis forms the basis of the paper, have the the degree of human impact from the volunteers, a and b are Table 2. A confusion matrix for the comparison of controls with responses from the crowd. Class 1 (control j) Class 2 (control j) … Class n (control j) Class 1 (volunteer i) x x …x 1,1 1,2 n,1 Class 2 (volunteer i) x x …x 2,1 2,2 n,2 … …… … … Class n (volunteer i) x x … n,1 n,2 doi:10.1371/journal.pone.0069958.t002 PLOS ONE | www.plosone.org 4 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 3. Regression analysis for the model Y =a+bX +e , Table 4. Extending the regression to include an indicator of i i i where Y is the degree of human impact from the control data, expertise, where b is the regression coefficient for this i E X is the degree of human impact from the participants. indicator and b is the regression coefficient for participant i X human impact scores. Estimate Std. Error t value Pr(.|t|) Estimate Std. Error t value Pr(.|t|) a 11.300 0.363 31.16 0.000 a 9.009 0.432 20.85 0.000 b 0.699 0.006 122.43 0.000 b 0.705 0.006 123.49 0.000 doi:10.1371/journal.pone.0069958.t003 b 4.251 0.442 9.62 0.000 coefficients of the linear regression equation and e is a normally doi:10.1371/journal.pone.0069958.t004 distributed random error term for each observation i. the response times, with a and b representing coefficients of the Each volunteer provided information on expertise during linear regression, and e the error term for each observation i. registration. Equation 1 was extended to include an indicator of The last 100 control points provided to the volunteers at the end respondent expertise in the regression model: of the competition were locations of cropland or agricultural land covers (the classes of Cultivated and managed and Mosaic of cultivated and managed/natural vegetation) and where high resolution images Y ~azb X zb E ze ð2Þ i X i E i i existed. In order to evaluate how volunteer performance changed where, in addition to the previously defined variables, b is the with experience, only control points with agricultural land cover regression coefficient for volunteer human impact, E is the and where high resolution images were available were selected expertise indicator variable for observation i (0 for Non-Expert, 1 from the first 199 control points. The average accuracy in human for Expert), and b is the regression coefficient for this variable. E impact across the first two control sets was then compared to the Thus, this coefficient is a measure of the difference in human average accuracy of the third set using a t-test to determine impact (on aggregate) between the Non-Expert and Expert whether there were any significant differences. contributions. This model implicitly assumes human impact is equally predicted by experts and non-experts (i.e. is uniform), and Analysis of Land Cover assumes a uniformity of the intercept term within each expert As in the analysis of human impact scores above, control points group, if the intercept is considered to be a for the non-expert were used to evaluate volunteer accuracy in terms of the land group, and a+b for the expert one. cover they indicated. An error or confusion matrix was populated The data provided by the volunteers were then analysed for for all contributors (Table 2) and the overall accuracy was consistency, which is a known issue in ground truthing [23]. After calculated as follows: every 50 points, the volunteers were provided with a point they had previously validated. The average, median and standard deviation of the maximum difference between the volunteers and i,j i,j~1 the controls were calculated for all control points, by expertise, by 07 Accuracy~  100 ð3Þ n n P P volunteer consistency in the land cover they recorded, and by i,j confidence. i~1 j~1 Finally, the response times of the volunteers were calculated between each successive data point they scored. The median where i is the volunteer class, j is the control class and n is the total response time was 55 secs with a first and third quartile of 32 and number of classes. 100 secs respectively. The average response time was 5,226 secs, In addition, two other measures of accuracy were calculated, indicating a highly skewed distribution, which reflects large pauses specific to each land cover class: user’s and producer’s accuracies. in contributions, e.g. at the end of a validation session. Figure 3 User’s accuracy describes errors of commission or Type I errors. shows the median response time per day over the course of the For example, the user’s accuracy for the forest class indicates the competition. There is a general trend towards shorter response likelihood that what was labeled as forest by the volunteers really is times as the competition unfolded with the shortest response times forest. Producer’s accuracy reflects errors of omission or Type II between successive validations occurring at the end of the errors. Using the forest example again, this measure reflects how competition. Thus, we were interested in understanding the well the forest cover control pixels were classified by the relationship between response time and quality of the human impact responses overall and whether there was any difference in Table 5. The regression analysis of predicting the degree of quality towards the end of the competition. human impact by expert and non-expert groups, when the The response time data were first pre-processed in two ways. regression is split into 2 simultaneous models. First, all response times greater than 5 minutes were removed as these were deemed unrepresentative of typical behavior. This was based on visual inspection of the distribution. However, 5 minutes Estimate Std. Error t value Pr(.|t|) th also represents the 92.5 percentile and therefore includes the a (Expert) 7.960 0.527 15.12 0.000 majority of the data. Second, response times were log transformed due to the skewness of the distribution. A linear regression a (Non-expert) 14.200 0.494 28.74 0.000 equation of the form given in (1) was fit to the entire dataset where b (Expert) 0.725 0.008 91.06 0.000 the dependent variable, Y , was the absolute difference in the b (Non-expert) 0.685 0.008 83.61 0.000 answers for human impact between the control data and the volunteers’ scores, and the independent variable, X , was the log of doi:10.1371/journal.pone.0069958.t005 PLOS ONE | www.plosone.org 5 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 4. The distribution of human impact by land cover class. The distribution is shown for (a) the control pixels and (b) the volunteers, where the latter show a much wider range of answers. doi:10.1371/journal.pone.0069958.g004 volunteers. These two measures are calculated as follows: where the probability (P ) that the land cover is correctly identified is expressed as a function of response time, X . The effect of response time on accuracy in the final set of i,i User s Accuracy(by classi)~  100 ð4Þ P controls was compared with the first and second set to determine i,j whether contributors were more interested in scoring a greater j~1 number of points and spent less time on each data point towards the end of the competition. A two-tailed binomial test was used to test whether the number of correct classifications at the end of the j,j Producer s Accuracy(by classj)~  100 ð5Þ competition was greater than expected based on the total number i,j of classifications performed and the probability of correct i~1 classification in the earlier part of the competition. where i is the volunteer class, j is the control class and n is the total Results and Discussion number of classes. Separate accuracy measures were calculated for the three sets of Human Impact control pixels (to determine whether accuracies change over time) The result of the regression described in Equation 1 to for locations where the volunteers were the most confident and to determine how well the degree of human impact can be predicted compare experts and non-experts. by the contributors based on the control points is provided in Contributor consistency in land cover labeling was then Table 3. This shows that b differs significantly from zero and is analysed by determining the proportion of times when the same positive but less than 1 suggesting that there is evidence that the land cover type was chosen when presented with the same data users underestimated the degree of human impact by roughly 30 point. This was calculated for all points, by expertise, and by percent. various degrees of confidence. The results of including an indicator variable describing Finally, the impact of response time on the quality of land cover respondent expertise (Equation 2) are shown in Table 4. The validations was analysed using logistic regression of the following slopes are still positive and suggest that allowing for expertise even form: in a simple way changes the results of relating to the slope term. To investigate this further, Equation 1 was extended to include Logit(P )~azbX ð6Þ i i variables describing expertise. Although computed together, this effectively splits the regression into two models - one for each of PLOS ONE | www.plosone.org 6 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 5. The relationship between the volunteer responses and the controls for human impact by land cover type. The lines show the coefficient slopes when each control land cover class is evaluated in turn. Note that the data points have had a small random noise component added to allow their density to be visualised. doi:10.1371/journal.pone.0069958.g005 the expert groups - and the results are shown in Table 5. These Figure 4 shows the distribution of human impact scores for the results indicate that there is little variation in the degree to which control pixels and the contributor data by land cover class. It shows a general trend for contributors to underestimate the degree the expert and non-expert group underestimated the degree of human impact. of human impact across the different land cover types with the exception of (5) Mosaic of cultivated and managed/natural vegetation. A further analysis explored how human impact scores varied with land cover class. The standard regression described in Equation 1 was extended to include indicators for the land cover Table 6. Regression analysis for the degree of human impact. classes. Since there was only a small number of data points classified as Open water, Barren or Urban, these classes were excluded from the regression analysis. The results for the remaining five Estimate Std. Error t value Pr(.|t|) land cover types are shown in Table 6 and Figure 5 plots the a (Tree cover) 7.264 0.343 21.16 0.000 contributed against the control human impact scores with the a (Shrub cover) 4.284 0.520 8.24 0.000 regression coefficients for different land cover classes. a (Herb./Grass) 6.567 0.504 13.03 0.000 The results show that the prediction of the degree of human impact varies with land cover classes. The coefficients for the a (Cultivated) 73.669 0.857 86.01 0.000 Herbaceous vegetation/Grassland class most strongly predict human a (Cult./nat mosaic) 36.046 0.485 74.32 0.000 impact, the coefficients for the Shrub cover class are the weakest b (Tree cover) 0.220 0.012 18.52 0.000 predictors and all classes underestimate human impact. This b (Shrub cover) 0.089 0.021 4.34 0.000 indicates that the conceptualizations of these classes may need to b (Herb./Grass) 0.366 0.015 24.62 0.000 be more clearly defined and perhaps more training examples used to illustrate the different degrees of human impact by land cover b (Cultivated) 0.098 0.010 10.06 0.000 type. b (Cult./nat mosaic) 0.273 0.008 33.58 0.000 Table 7 shows the results of the consistency analysis. Overall the doi:10.1371/journal.pone.0069958.t006 contributors were consistent in their answers regarding the degree PLOS ONE | www.plosone.org 7 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 7. Consistency of response to degree of human impact. Disaggregation Category Average HI (%) Median HI (%) Std Dev (%) All All points 9.60 0.00 17.43 Expertise Experts 10.90 5.00 18.50 Non-experts 7.95 0.00 15.82 Land cover consistency Agree on land cover between points 7.20 0.00 14.55 Disagree on land cover between points 17.25 10.00 22.80 Confidence Sure 7.92 0.00 15.68 Sure+Quite sure 9.13 0.00 16.93 Quite sure+Less sure+Unsure 22.08 15.00 23.65 Less sure+Unsure 25.92 15.00 25.16 doi:10.1371/journal.pone.0069958.t007 of human impact, with an average deviation of less than 10% (i.e. Land Cover 9.6%) although the spread of answers was higher at 17.4%. When The overall accuracies for the three sets of control points labeled expertise was considered, non-experts had a lower average C1, C2 and C3 are presented in Table 9 for the full dataset, deviation than the experts by just under 3%. When the consistency considering only those contributions where confidence was high was extended to land cover, those pixels which showed consistent (i.e. ‘sure’ on the slider bar) and then disaggregated by expertise choices in land cover had a lower average deviation in human (i.e. experts or non-experts). impact by 8.3% compared to those which showed inconsistency in Considering all three sets of control data, accuracy varies land cover choice. This reflects pixels that were clearly more between 66 and 76%. There is little difference between the first difficult to identify. Finally, when contributors were the most and second set of controls but there is a marked increase in confident in their choice of human impact, they were also more accuracy for the final set (C3) with 76%. This is unsurprising since consistent (average deviation of 7.9%), with consistency decreasing the final control sample was drawn from high resolution imagery. as confidence decreased resulting in an average deviation of as When taking only those answers where the volunteers indicated much as 25.9% for the least confident category. This analysis of high confidence (or ‘sure’ on the slider bar), there was around a consistency serves to highlight the need to examine those pixels 3% increase in the accuracy to 69%. Unlike with human impact, which were not consistently labeled and which are probably more experts were more accurate than non-experts, e.g. 62% for non- difficult to judge in terms of both human impact and land cover, experts and 69% for experts for C1 with even larger differences observed for C2 and C3. This suggests that extra training should which can then be used to help train the volunteers. The results of the regression analyzing the effect of response be provided to those individuals with a non-expert background. As training manuals are often unread or rarely consulted, a more times are shown in Table 8 and indicate that the agreement interactive approach could be introduced such that the volunteers between the volunteers and the control pixels increased signifi- are made aware of their errors as they progress through a cantly with a faster response time for human impact, although the competition. In addition, a forum could be set up to discuss pixels effects were small. For each increase in magnitude in response that present difficulties in identification, particularly for non- time, the agreement between the crowd and the control pixels experts. increased in accuracy by 1.4%. The average deviation in human Table 10 shows the user’s and producer’s accuracies for the five impact for pixels of (4) Cultivated and managed and (5) Mosaic of most common land cover types in the dataset. Overall the results cultivated and managed/natural vegetation and high resolution imagery show that there is generally an increase in the accuracy across from the first two control sets was 17.1%. This was compared to control sets although C3 should only really be considered for the third set of control data points (consisting of only these pixel cropland and mosaic classes. The lowest accuracies are in shrub types) and the average deviation in human impact was lower, cover, grassland/herbaceous and the mosaic cropland class, which decreasing to 14.7%. A t-test confirmed that the means are significantly different from one another (p,0.0001; t =24.8533; degrees of freedom = 3326.222) and showed that accuracy in human impact actually increased at the end of the competition. Table 9. Accuracy of land cover (in %) based on comparison of volunteer response with three sets of controls. Thus, these analyses indicate that there are no particular concerns over quality in relation to response time. No allowance for confusion between classes Dataset used Table 8. Regression analysis for the model Y =a+bX +e i i i C1 C2 C3 where X is response time and Y is human impact. i i Full dataset 66.4 66.5 76.2 Confidence rating 69.4 69.3 78.9 Estimate Std. Error t value Pr(.|t|) of sure a 12.9915 1.0706 12.135 0.000 Experts 69.2 72.3 84.6 b 1.4110 0.6157 2.291 0.022 Non-experts 62.4 61.9 65.9 doi:10.1371/journal.pone.0069958.t008 doi:10.1371/journal.pone.0069958.t009 PLOS ONE | www.plosone.org 8 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 10. User’s and producer’s accuracies for the five main Table 11. Consistency of response in choosing the land land cover types and for different subsets of the data cover type. including confidence and expertise. Disaggregation Category Consistent Percentage Land None Full dataset Y 76.1 cover No confusion Data set type N 23.9 User’s accuracy Producer’s accuracy Expertise Expert Y 75.7 N 24.3 C1 C2 C3 C1 C2 C3 Non-Expert Y 76.7 Full 1 75.9 77.4 43.6 67.1 69.6 100.0 N 23.3 2 52.1 46.5 0.0 61.7 67.2 N/A Confidence Sure Y 77.6 3 45.1 56.3 6.0 51.3 56.3 30.0 N 22.4 4 78.9 88.8 95.2 74.2 72.8 76.0 Quite sure+Less Y 76.4 5 71.5 68.8 64.6 62.2 60.7 76.4 sure+Unsure Sure 1 78.7 82.4 53.1 68.0 70.2 100.0 N 23.6 2 50.8 48.6 0.0 64.4 71.2 N/A Less sure+Unsure Y 66.7 3 43.9 52.4 10.7 47.7 53.7 50.0 N 33.3 4 81.0 89.6 95.2 76.5 75.0 78.7 doi:10.1371/journal.pone.0069958.t011 5 72.4 68.2 63.7 66.8 65.8 78.8 Expert 1 78.4 83.5 52.6 73.0 68.8 100.0 Similar to human impact, a further analysis was then 2 54.8 45.7 0.0 63.8 65.1 N/A undertaken on a subset of the data where the volunteers were 3 50.9 65.6 7.1 52.4 65.2 33.3 provided with the same pixels at different times in the competition 4 77.1 90.5 95.5 82.6 80.5 86.5 (Table 11). The results show that the volunteers were consistent in their response just over 76.1% of the time where this was slightly 5 76.5 75.7 78.1 59.3 71.8 80.2 lower for experts (75.7%) and slightly higher for non-experts Non- 1 71.9 73.6 35.0 58.6 70.2 100.0 (76.7%). A very minor increase to 77.6% was observed when expert considering only those pixels where the volunteer was sure but 2 48.5 47.2 0.0 58.9 68.8 N/A when the volunteers were less sure or unsure about their responses, 3 38.0 48.7 5.6 49.5 48.9 28.6 their consistency in response decreased to 66.7%. 4 82.8 87.0 94.6 61.2 66.3 63.0 The final analysis concerned the relationship between quality in 5 66.1 62.4 52.5 66.3 51.6 71.8 land cover classification and response time. The results showed that the crowd was 40% more likely to disagree with the control 1 = Tree cover; 2 = Shrub cover; 3 = Herbaceous vegetation/Grassland; 4 = for each order of magnitude increase in response time (p,.0001) Cultivated and managed; 5 = Mosaic of cultivated and managed/natural vegetation. as shown in Table 12 and indicated by the value of b. doi:10.1371/journal.pone.0069958.t010 Considering the issue of whether quality in land cover validation (and therefore accuracy) decreased near the end of the competi- indicates the need to provide more examples of how these classes tion, we compared the probability that the volunteers agreed with appear on Google Earth within the training materials as the the control pixels for land cover types (4) Cultivated and managed and volunteers are confusing these classes more often than others. (5) Mosaic of cultivated and managed/natural vegetation at the end of the When considering points where the volunteer had a high competition (75.9%) with that from the early to middle part of the confidence, the patterns are similar and there is generally an competition (70.6%). This difference was determined to be highly increase in accuracy although the mosaic cropland class continues significant (p,.0001; number of trials = 1500; number of to be more problematic, with a decrease in the user’s accuracy successes = 1139) using a binomial test and therefore the accuracy across control sets. Finally, the effect of expertise on land cover in estimating land cover actually increased in the final stages of the classification accuracy produced variable results depending upon competition. Thus for both human impact and land cover, there the land cover type and the control set considered. For the forest are no concerns about the quality decreasing near the end of the class, the non-experts improved in their ability to correctly identify competition with a faster response time. forest by the second set of controls, while the non-experts actually showed a decrease in the producer’s accuracy. Similarly, for the shrub class, the non-expert showed a greater level of improvement Table 12. Logistic regression analysis for the model Logit in the second set of controls compared to the expert and (P ) =a+bX where X is the log of the response time and P is i i i i outperformed them in terms of both user’s and producer’s the probability that the land cover is correctly identified. accuracy in C2. The experts were better than non-experts at identifying herbaceous, cropland and mosaic but once again there were differences in the user’s and producer’s accuracies. By Estimate Std. Error t value Pr(.|t|) building up a picture of where experts and non-experts have a 1.46573 0.13955 10.504 0.000 differing performance by land cover class, we can tailor the kinds b 20.40005 0.07957 25.027 0.000 of training materials provided to the volunteers, focusing on areas where greater problems in identification lie. doi:10.1371/journal.pone.0069958.t012 PLOS ONE | www.plosone.org 9 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data advantages of citizen science. For example, recent activity such as Conclusions the umbrella Zooniverse project (http://www.zooniverse.org) This paper assessed the quality of crowdsourced data collected promotes collaborative projects in many areas of social and through a Geo-Wiki competition. Volunteers identified the degree physical science research. Currently, registration to its projects of human impact and classified land cover at random locations using Google Earth images. Quality was assessed by comparing captures no information about the contributor, their training or their socio-economic context. Approaches that include informa- volunteer results with results agreed by experts at a number of control points. Control points were provided to volunteers at the tion about participant background, control points, reflection, beginning, middle and end of the competition. The results showed repetition, etc. have broad potential for other citizen science that there is little difference between experts and non-experts in projects that involve classification or identification, e.g. [24,5] identifying human impact while experts were better than non- where experts can be used to build a database of controls for experts in identifying land cover. However, the results for both monitoring and learning purposes. varied by land cover type and through the competition. For The next step in this research is to develop robust measures of example, experts were better than non-experts at identifying shrub quality for each location in the crowdsourced database based on land cover at the start of the competition but non-experts rules that take into account the number of times that contributors improved more than experts and then outperformed them in have provided information at a given location along with the shrub cover identification by the middle of the competition, consensus in the answers, their expertise and the confidence in the indicating that volunteers were learning over time. The volunteers answers provided. However, the results from this study suggest the were shown to be reasonably consistent in their characterizations need for more nuanced approaches than a simple Linus Law or of human impact and land cover with non-experts outperforming mass of evidence approach (which have been previously suggested the experts in terms of human impact and vice versa for land in this domain) for determining when to believe the crowd and cover. Moreover, when contributors were confident in their choice therefore when the information they provide can be used with of human impact, they were also more consistent, and unsurpris- confidence. Formal methods for combining evidence such as ingly, consistency decreased as confidence decreased. Finally, Bayesian probability, Dempster-Shafer theory of evidence, Possi- increased response times (as observed towards the end of the bility Theory and Endorsement theory provide different ways for competition) did not have a negative impact on quality, and combining or partitioning evidence. They allow measures of volunteers were therefore not sacrificing quality for the desire to certainty and uncertainty to be generated and provide different complete more locations and thereby win the Geo-Wiki compe- measures of confidence in aggregated information and for tition. Thus overall, the non-experts were as reliable in what they determining when the weight of evidence indicates that crowd- identified as the experts were for certain, identifiable situations, sourced data or VGI are ‘believable’. Since the relationship and the reliability of the information provided by non-experts between reliability and confidence was found to be strong in this improved faster and to a greater degree than experts. Thus, better, research, this also suggests that future activities seeking to targeted training materials and a continual learning process built incorporate crowdsourced data should capture measures of into the competition might help address these issues. Also, allowing contributor confidence in the information they provide. Ongoing volunteers to reflect on the information they contribute, for research by the authors will investigate these areas in more detail. example by regularly feeding back evaluations of their data through the use of control points or by making additional material Author Contributions available to them, would also potentially decrease differences between experts and non-experts, particularly in the classification Conceived and designed the experiments: LS SF MV CP C. Schill IM. Performed the experiments: LS SF MV CP C. Schill IM. Analyzed the of land cover. The findings of this research relating to the data: LS AC C. Salk SF MV. Wrote the paper: LS AC C. Salk SF MV CP differences between expert and non-expert citizens are also C. Schill IM FK MO. relevant to other areas of research that seek to benefit from the References 1. Goodchild MF, Li L (2012) Assuring the quality of volunteered geographic 10. Van der Velde M, See L, Fritz S, Verheijen FGA, Khabarov N, et al. (2012) Generating crop calendars with Web search data. Environmental Research information. Spatial Statistics 1: 110–120. doi:10.1016/j.spasta.2012.03.002. Letters 7: 024022. doi:10.1088/1748-9326/7/2/024022. 2. Schuurman N (2009) The new Brave NewWorld: geography, GIS, and the 11. Haklay M, Basiouka S, Antoniou V, Ather A (2010) How Many Volunteers emergence of ubiquitous mapping and data. Environment and Planning D: Does it Take to Map an Area Well? The Validity of Linus’ Law to Volunteered Society and Space 27: 571–580. Geographic Information. The Cartographic Journal 47: 315–322. 3. Miller-Rushing A, Primack R, Bonney R (2012) The history of public 12. Comber A, See L, Fritz S, Van der Velde M, Perger C, et al. (2013) Using participation in ecological research. Frontiers in Ecology and the Environment control data to determine the reliability of volunteered geographic information 10: 285–290. doi:10.1890/110278. about land cover. International Journal of Applied Earth Observation and 4. Khatib F, DiMaio F, Group FC, Group FVC, Cooper S, et al. (2011) Crystal Geoinformation 23: 37–48. doi:10.1016/j.jag.2012.11.002. structure of a monomeric retroviral protease solved by protein folding game 13. Foody GM, Boyd D (2012) Using volunteered data in land cover map validation: players. Nature Structural & Molecular Biology 18: 1175–1177. doi:10.1038/ Mapping tropical forests across West Africa 2368–2371. nsmb.2119. 14. Fritz S, McCallum I, Schill C, Perger C, Grillmayer R, et al. (2009) Geo- 5. Clery D (2011) Galaxy Zoo Volunteers Share Pain and Glory of Research. Wiki.Org: The Use of Crowdsourcing to Improve Global Land Cover. Remote Science 333: 173–175. doi:10.1126/science.333.6039.173. Sensing 1: 345–354. doi:10.3390/rs1030345. 6. Nayar A (2009) Model predicts future deforestation. Nature News. Available: 15. Fritz S, McCallum I, Schill C, Perger C, See L, et al. (2012) Geo-Wiki: An online http://www.nature.com/news/2009/091120/full/news.2009.1100.html. Ac- platform for improving global land cover. Environmental Modelling & Software cessed 11 February 2013. 31: 110–123. doi:10.1016/j.envsoft.2011.11.015. 7. Milc ˇinski G (2011) The rise of crowd-sourcing: how valuable data can we get out 16. Bonney R, Ballard H, Jordan R, McCallie E, Phillips T, et al. (2009) Public of VGI Amsterdam, Netherlands. Participation in Scientific Research: Defining the Field and Assessing its 8. Preis T, Moat HS, Stanley HE, Bishop SR (2012) Quantifying the advantage of Potential for Informal Science Education. A CAISE Inquiry Group Report. looking forward. Scientific Reports 2. Available: http://www.nature.com/ Washington DC: Center for Advancement of Informal Science Education doifinder/10.1038/srep00350. Accessed 18 May 2013. (C AISE ). Avai lable: http ://cai se. i nsci. o r g /u ploads/docs / 9. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. (2009) PPSR%20report%20FINAL.pdf. Detecting influenza epidemics using search engine query data. Nature 457: 17. Fritz S, See L (2005) Comparison of land cover maps using fuzzy agreement. 1012–1014. doi:10.1038/nature07634. International Journal of Geographical Information Science 19: 787–807. doi:10.1080/13658810500072020. PLOS ONE | www.plosone.org 10 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data 18. See LM, Fritz S (2006) A method to compare and improve land cover datasets: 22. Perger C, Fritz S, See L, Schill C, Van der Velde M, et al. (2012) A campaign to application to the GLC-2000 and MODIS land cover products. IEEE collect volunteered geographic Information on land cover and human impact. Transactions on Geoscience and Remote Sensing 44: 1740–1746. In: Jekel T, Car A, Strobl J, Griesebner G, editors. GI_Forum 2012: doi:10.1109/TGRS.2006.874750. Geovisualisation, Society and Learning. Berlin/Offenbach: Herbert Wichmann 19. Fritz S, See L, McCallum I, Schill C, Obersteiner M, et al. (2011) Highlighting Verlag. 83–91. continued uncertainty in global land cover maps for the user community. 23. Lopresti D, Nagy G (2002) Issues in ground-truthing graphic documents. In: Environmental Research Letters 6: 044005. doi:10.1088/1748-9326/6/4/ Blostein D, Kwon Y-B, editors. GREC 2001. LCNS. Heidelberg: Springer, Vol. 2390. 46–66. 20. Fritz S, See L, Van der Velde M, Nalepa RA, Perger C, et al. (2013) 24. Bonter DN, Cooper CB (2012) Data validation in citizen science: a case study Downgrading recent estimates of land available for biofuel production. Environ from Project FeederWatch. Frontiers in Ecology and the Environment 10: 305– Sci Technol 47: 1688–1694. doi:10.1021/es303141h. 307. doi:10.1890/110273. 21. Theobald DM (2004) Placing exurban land-use change in a human modification framework. Frontiers in Ecology and the Environment 2: 139–144. doi:10.1890/ 1540-9295(2004)002[0139:PELCIA]2.0.CO;2. PLOS ONE | www.plosone.org 11 July 2013 | Volume 8 | Issue 7 | e69958 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png PLoS ONE Pubmed Central

Comparing the Quality of Crowdsourced Data Contributed by Expert and Non-Experts

Loading next page...
 
/lp/pubmed-central/comparing-the-quality-of-crowdsourced-data-contributed-by-expert-and-J7rGH90nNm

References (50)

Publisher
Pubmed Central
Copyright
© 2013 See et al
ISSN
1932-6203
eISSN
1932-6203
DOI
10.1371/journal.pone.0069958
Publisher site
See Article on Publisher Site

Abstract

There is currently a lack of in-situ environmental data for the calibration and validation of remotely sensed products and for the development and verification of models. Crowdsourcing is increasingly being seen as one potentially powerful way of increasing the supply of in-situ data but there are a number of concerns over the subsequent use of the data, in particular over data quality. This paper examined crowdsourced data from the Geo-Wiki crowdsourcing tool for land cover validation to determine whether there were significant differences in quality between the answers provided by experts and non- experts in the domain of remote sensing and therefore the extent to which crowdsourced data describing human impact and land cover can be used in further scientific research. The results showed that there was little difference between experts and non-experts in identifying human impact although results varied by land cover while experts were better than non- experts in identifying the land cover type. This suggests the need to create training materials with more examples in those areas where difficulties in identification were encountered, and to offer some method for contributors to reflect on the information they contribute, perhaps by feeding back the evaluations of their contributed data or by making additional training materials available. Accuracies were also found to be higher when the volunteers were more consistent in their responses at a given location and when they indicated higher confidence, which suggests that these additional pieces of information could be used in the development of robust measures of quality in the future. Citation: See L, Comber A, Salk C, Fritz S, van der Velde M, et al. (2013) Comparing the Quality of Crowdsourced Data Contributed by Expert and Non- Experts. PLoS ONE 8(7): e69958. doi:10.1371/journal.pone.0069958 Editor: Tobias Preis, University of Warwick, United Kingdom Received February 14, 2013; Accepted June 12, 2013; Published July 31, 2013 Copyright:  2013 See et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funding from the Austrian Agency for the Promotion of Science via the project LandSpotting (No. 828332). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: see@iiasa.ac.at advantages of VGI is the potential increase in the volumes of Introduction data about all kinds of spatially referenced phenomena. Such data The proliferation of Web2.0 technology over the last decade has can be collated and used for many different scientific activities: resulted in changes in the way that data are created. Individual from the calibration of scientific models (e.g. economic prediction citizens now provide vast amounts of information to websites and models that require information about land use) to the validation online databases, much of which is spatially referenced. The of existent data (e.g. maps derived through Earth Observation). analysis and exploitation of this georeferenced subset of crowd- With improved connectivity via mobile phones and the use of sourced data, or what is more commonly referred to as low cost, ubiquitous sensors (e.g. those which directly and volunteered geographic information (VGI) [1,2], has the potential instantaneously capture data about their immediate environment), to fundamentally change the nature of scientific investigation. the opportunities to exploit such rich veins of VGI are many and Citizens have a long history of being involved in scientific research varied. However, whilst one of the pressing challenges concerns or the more recently coined ‘citizen science’ [3]. There are many how to manage large data volumes in terms of processing and successful examples of citizen science that have led to new storage, a number of yet unaddressed issues persist. These include scientific discoveries, including unravelling protein structures [4] how to handle data privacy, how to ensure adequate security, and and discovering new galaxies [5], as well as websites for public critically, how to assess VGI data quality. Data quality is an area reporting of illegal logging/deforestation [6] and waste dumping that has attracted increasing attention in the literature [1,11–13]: [7], which have demonstrated how citizens can have a visible quantifying VGI data quality underpins its usefulness (that is, its impact upon the environment and local governance. Analysis of reliability and credibility) and potential for incorporation into more passive sources of geo-tagged data from the crowd from scientific analyses. The critical issue is whether ordinary citizens search engines such as Google has also revealed interesting can provide information that is of high enough quality to be used scientific trends, e.g. the relationship between GDP and searches in formal scientific investigations. about the future [8], trends in influenza [9] and the ability to With open access to high resolution satellite imagery through characterize crop planting dates [10]. One of the critical providers such as Google Earth and Bing Maps, it is possible to PLOS ONE | www.plosone.org 1 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 1. The spectrum of human impact. Human Impact Description 0% No evidence of any human activity visible 1 to 50% Some visible evidence of human activities such as tracks/roads; evidence of managed forests; some evidence of deforestation; some scattered human dwellings, some scattered agricultural fields; some evidence of grazing 51% to 80% Increasing density of agriculture from subsistence on the lower end to intensive, commercial agriculture with large field sizes on the upper end 81% to 99% Urban areas with decreasing amounts of green space and increasing density of housing 100% A built up urban area with no green space, typically the business district of a city doi:10.1371/journal.pone.0069958.t001 collect vast amounts of volunteered information about the Earth’s which VGI can be trusted as a source of training and validation surface such as land cover and land use. The collection of data in remote sensing. However, by investigating generic research crowdsourced land cover data is the main aim of the Geo-Wiki questions related to the quality and reliability of information project [14,15] in what is currently a contributory approach to contributed by citizens with different levels of domain expertise, citizen science [16]. Geo-Wiki is a web-based geospatial portal this research should also be of interest to the broader field of (http://www.geo-wiki.org) with an interface linked to Google citizen science. The next section describes data collection via the Earth. It can be used to visualize and validate global land cover human impact Geo-Wiki campaign and the analysis of volunteer datasets such as GLC-2000, MODIS and GlobCover [12] which and volunteered data quality. Following the results, some frequently disagree over the land cover they record at any given discussion is provided regarding the implications of incorporating location [17–19]. Since its inception, a number of Geo-Wiki VGI in scientific research including recommendations for further branches have been initiated, each one specifically devoted to research before conclusions are drawn in the final section. gathering different types of information such as agriculture (agriculture.geo-wiki.org), urban areas (cities.geo-wiki.org), bio- Materials and Methods mass (biomass.geo-wiki.org) and more recently human impact Data from the Human Impact Competition (humanimpact.geo-wiki.org). Crowdsourced data on land cover were collected using a branch The general aim of this paper is to determine whether there are of Geo-Wiki called Human Impact (http://humanimpact.geo- significant differences in quality in the information contributed by wiki.org) and the data were subsequently used to validate a map of experts and non-experts. This is explored through a land cover land availability for biofuel production [20]. The volunteers were case study with obvious implications for the domains of remote presented with pixel outlines of 1 km resolution (at the equator) sensing and landscape analyses and investigation of the extent to Figure 1. Number of pixels classified per day by the volunteers. These are daily totals from the start of the competition on day 1 to the end at just over 50 days, which shows a clear acceleration as the competition progressed. doi:10.1371/journal.pone.0069958.g001 PLOS ONE | www.plosone.org 2 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data projected onto Google Earth (where pixels in this context refer to with students, and through social media. Background information the smallest area for which information is collected) and were then on the competitors was collected through the registration asked to determine the percentage of human impact and the land procedure. The competition ran for just under 2 months in the cover type at each location from the following list: (1) Tree cover, autumn of 2011 [22]. The top ten volunteers were offered co- (2) Shrub cover, (3) Herbaceous vegetation/Grassland, (4) authorship on a paper resulting from the competition [20] as well Cultivated and managed, (5) Mosaic of cultivated and managed/ as Amazon vouchers as an incentive. Other incentives included natural vegetation, (6) Flooded/wetland, (7) Urban, (8) Snow and inviting friends, which resulted in extra points, a leader board so ice, (9) Barren and (10) Open Water. The concept of ‘human that competitors could gauge the competition, and appealing to impact’ was defined as the amount of evidence of human activity the environmental motivation of individuals through the biofuel visible in the Google Earth images. A spectrum of these intensities theme. is shown in Table 1, which is loosely based on the ideas of A set of 299 ‘control’ points was used to determine quality Theobald [21]. Volunteers were also asked to indicate their where three experts with backgrounds in physical geography, confidence in the class type and the impact score, whether they geospatial sciences, remote sensing and image classification agreed had used high resolution imagery and the date of the image. upon the land cover at each location. The first 99 control points Volunteers were recruited by emails sent to registered Geo-Wiki were provided to the volunteers at the start of the competition, the volunteers, relevant mailing lists and contacts, in particular those next 100 were provided three-quarters of the way through and the Figure 2. Global distribution of pixels collected by the volunteers. The distribution is shown by (a) human impact and (b) land cover type. doi:10.1371/journal.pone.0069958.g002 PLOS ONE | www.plosone.org 3 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 3. Median response time of the volunteers. The response time is in seconds measured from the start of the competition until the end at just over 50 days. doi:10.1371/journal.pone.0069958.g003 final 100 were provided at the end, where the latter were drawn following characteristics. Experts evaluated an average of 64.8 control data points each (s.d. 108.1) and non-experts 57.2 (s.d. from higher resolution imagery. The volunteers were then ranked by an index that combined quality and quantity through equal 95.1). Although there is the potential for a few individuals to have a disproportionately large impact on data quality and composition, weighting, and the top ten were declared the winners. Interest- in this case, of the 29 experts, 18 contributed more than 50 ingly, there were some minor changes in the top ten once quality evaluations, and of the 33 non-experts, 19 evaluated more than 50 was considered. data points. The volunteers’ demographics (age, gender, socio- A total of ,53,000 locations were validated by more than 60 economic status etc.) were not captured as part of the contributor individuals and Figure 1 shows the rapid increase in contributions registration. This is unfortunate, because although a proxy for in the last 20 days of the competition, with a particularly large previous experience is evaluated in this paper, it is well recognised spike at the end. Figure 2 illustrates the spatial distribution of the that such factors can influence contributor responses. Such data ,53,000 points collected expressed as measures of human impact will be collected in future campaigns. and land cover. Note that the crowdsourced data can be freely downloaded from http://www.geo-wiki.org. Analysis of Human Impact Of these ,53,000 validations, 7657 were at the control To determine how well the answers provided by the volunteers locations, which were then used to assess quality. The data were matched the control data in terms of the degree of human impact, then filtered for ‘unknown’ expertise resulting in 4020 control data a linear regression was fit as follows: points scored by 29 Expert volunteers and 3548 control data points scored by 33 Non-expert volunteers. Experts were considered to be individuals with a background in remote Y ~azbX ze ð1Þ i i i sensing/spatial sciences versus non-experts who were new to this discipline or had some self-declared limited background. The where Y is the degree of human impact from the control data, X is i i control data, whose analysis forms the basis of the paper, have the the degree of human impact from the volunteers, a and b are Table 2. A confusion matrix for the comparison of controls with responses from the crowd. Class 1 (control j) Class 2 (control j) … Class n (control j) Class 1 (volunteer i) x x …x 1,1 1,2 n,1 Class 2 (volunteer i) x x …x 2,1 2,2 n,2 … …… … … Class n (volunteer i) x x … n,1 n,2 doi:10.1371/journal.pone.0069958.t002 PLOS ONE | www.plosone.org 4 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 3. Regression analysis for the model Y =a+bX +e , Table 4. Extending the regression to include an indicator of i i i where Y is the degree of human impact from the control data, expertise, where b is the regression coefficient for this i E X is the degree of human impact from the participants. indicator and b is the regression coefficient for participant i X human impact scores. Estimate Std. Error t value Pr(.|t|) Estimate Std. Error t value Pr(.|t|) a 11.300 0.363 31.16 0.000 a 9.009 0.432 20.85 0.000 b 0.699 0.006 122.43 0.000 b 0.705 0.006 123.49 0.000 doi:10.1371/journal.pone.0069958.t003 b 4.251 0.442 9.62 0.000 coefficients of the linear regression equation and e is a normally doi:10.1371/journal.pone.0069958.t004 distributed random error term for each observation i. the response times, with a and b representing coefficients of the Each volunteer provided information on expertise during linear regression, and e the error term for each observation i. registration. Equation 1 was extended to include an indicator of The last 100 control points provided to the volunteers at the end respondent expertise in the regression model: of the competition were locations of cropland or agricultural land covers (the classes of Cultivated and managed and Mosaic of cultivated and managed/natural vegetation) and where high resolution images Y ~azb X zb E ze ð2Þ i X i E i i existed. In order to evaluate how volunteer performance changed where, in addition to the previously defined variables, b is the with experience, only control points with agricultural land cover regression coefficient for volunteer human impact, E is the and where high resolution images were available were selected expertise indicator variable for observation i (0 for Non-Expert, 1 from the first 199 control points. The average accuracy in human for Expert), and b is the regression coefficient for this variable. E impact across the first two control sets was then compared to the Thus, this coefficient is a measure of the difference in human average accuracy of the third set using a t-test to determine impact (on aggregate) between the Non-Expert and Expert whether there were any significant differences. contributions. This model implicitly assumes human impact is equally predicted by experts and non-experts (i.e. is uniform), and Analysis of Land Cover assumes a uniformity of the intercept term within each expert As in the analysis of human impact scores above, control points group, if the intercept is considered to be a for the non-expert were used to evaluate volunteer accuracy in terms of the land group, and a+b for the expert one. cover they indicated. An error or confusion matrix was populated The data provided by the volunteers were then analysed for for all contributors (Table 2) and the overall accuracy was consistency, which is a known issue in ground truthing [23]. After calculated as follows: every 50 points, the volunteers were provided with a point they had previously validated. The average, median and standard deviation of the maximum difference between the volunteers and i,j i,j~1 the controls were calculated for all control points, by expertise, by 07 Accuracy~  100 ð3Þ n n P P volunteer consistency in the land cover they recorded, and by i,j confidence. i~1 j~1 Finally, the response times of the volunteers were calculated between each successive data point they scored. The median where i is the volunteer class, j is the control class and n is the total response time was 55 secs with a first and third quartile of 32 and number of classes. 100 secs respectively. The average response time was 5,226 secs, In addition, two other measures of accuracy were calculated, indicating a highly skewed distribution, which reflects large pauses specific to each land cover class: user’s and producer’s accuracies. in contributions, e.g. at the end of a validation session. Figure 3 User’s accuracy describes errors of commission or Type I errors. shows the median response time per day over the course of the For example, the user’s accuracy for the forest class indicates the competition. There is a general trend towards shorter response likelihood that what was labeled as forest by the volunteers really is times as the competition unfolded with the shortest response times forest. Producer’s accuracy reflects errors of omission or Type II between successive validations occurring at the end of the errors. Using the forest example again, this measure reflects how competition. Thus, we were interested in understanding the well the forest cover control pixels were classified by the relationship between response time and quality of the human impact responses overall and whether there was any difference in Table 5. The regression analysis of predicting the degree of quality towards the end of the competition. human impact by expert and non-expert groups, when the The response time data were first pre-processed in two ways. regression is split into 2 simultaneous models. First, all response times greater than 5 minutes were removed as these were deemed unrepresentative of typical behavior. This was based on visual inspection of the distribution. However, 5 minutes Estimate Std. Error t value Pr(.|t|) th also represents the 92.5 percentile and therefore includes the a (Expert) 7.960 0.527 15.12 0.000 majority of the data. Second, response times were log transformed due to the skewness of the distribution. A linear regression a (Non-expert) 14.200 0.494 28.74 0.000 equation of the form given in (1) was fit to the entire dataset where b (Expert) 0.725 0.008 91.06 0.000 the dependent variable, Y , was the absolute difference in the b (Non-expert) 0.685 0.008 83.61 0.000 answers for human impact between the control data and the volunteers’ scores, and the independent variable, X , was the log of doi:10.1371/journal.pone.0069958.t005 PLOS ONE | www.plosone.org 5 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 4. The distribution of human impact by land cover class. The distribution is shown for (a) the control pixels and (b) the volunteers, where the latter show a much wider range of answers. doi:10.1371/journal.pone.0069958.g004 volunteers. These two measures are calculated as follows: where the probability (P ) that the land cover is correctly identified is expressed as a function of response time, X . The effect of response time on accuracy in the final set of i,i User s Accuracy(by classi)~  100 ð4Þ P controls was compared with the first and second set to determine i,j whether contributors were more interested in scoring a greater j~1 number of points and spent less time on each data point towards the end of the competition. A two-tailed binomial test was used to test whether the number of correct classifications at the end of the j,j Producer s Accuracy(by classj)~  100 ð5Þ competition was greater than expected based on the total number i,j of classifications performed and the probability of correct i~1 classification in the earlier part of the competition. where i is the volunteer class, j is the control class and n is the total Results and Discussion number of classes. Separate accuracy measures were calculated for the three sets of Human Impact control pixels (to determine whether accuracies change over time) The result of the regression described in Equation 1 to for locations where the volunteers were the most confident and to determine how well the degree of human impact can be predicted compare experts and non-experts. by the contributors based on the control points is provided in Contributor consistency in land cover labeling was then Table 3. This shows that b differs significantly from zero and is analysed by determining the proportion of times when the same positive but less than 1 suggesting that there is evidence that the land cover type was chosen when presented with the same data users underestimated the degree of human impact by roughly 30 point. This was calculated for all points, by expertise, and by percent. various degrees of confidence. The results of including an indicator variable describing Finally, the impact of response time on the quality of land cover respondent expertise (Equation 2) are shown in Table 4. The validations was analysed using logistic regression of the following slopes are still positive and suggest that allowing for expertise even form: in a simple way changes the results of relating to the slope term. To investigate this further, Equation 1 was extended to include Logit(P )~azbX ð6Þ i i variables describing expertise. Although computed together, this effectively splits the regression into two models - one for each of PLOS ONE | www.plosone.org 6 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Figure 5. The relationship between the volunteer responses and the controls for human impact by land cover type. The lines show the coefficient slopes when each control land cover class is evaluated in turn. Note that the data points have had a small random noise component added to allow their density to be visualised. doi:10.1371/journal.pone.0069958.g005 the expert groups - and the results are shown in Table 5. These Figure 4 shows the distribution of human impact scores for the results indicate that there is little variation in the degree to which control pixels and the contributor data by land cover class. It shows a general trend for contributors to underestimate the degree the expert and non-expert group underestimated the degree of human impact. of human impact across the different land cover types with the exception of (5) Mosaic of cultivated and managed/natural vegetation. A further analysis explored how human impact scores varied with land cover class. The standard regression described in Equation 1 was extended to include indicators for the land cover Table 6. Regression analysis for the degree of human impact. classes. Since there was only a small number of data points classified as Open water, Barren or Urban, these classes were excluded from the regression analysis. The results for the remaining five Estimate Std. Error t value Pr(.|t|) land cover types are shown in Table 6 and Figure 5 plots the a (Tree cover) 7.264 0.343 21.16 0.000 contributed against the control human impact scores with the a (Shrub cover) 4.284 0.520 8.24 0.000 regression coefficients for different land cover classes. a (Herb./Grass) 6.567 0.504 13.03 0.000 The results show that the prediction of the degree of human impact varies with land cover classes. The coefficients for the a (Cultivated) 73.669 0.857 86.01 0.000 Herbaceous vegetation/Grassland class most strongly predict human a (Cult./nat mosaic) 36.046 0.485 74.32 0.000 impact, the coefficients for the Shrub cover class are the weakest b (Tree cover) 0.220 0.012 18.52 0.000 predictors and all classes underestimate human impact. This b (Shrub cover) 0.089 0.021 4.34 0.000 indicates that the conceptualizations of these classes may need to b (Herb./Grass) 0.366 0.015 24.62 0.000 be more clearly defined and perhaps more training examples used to illustrate the different degrees of human impact by land cover b (Cultivated) 0.098 0.010 10.06 0.000 type. b (Cult./nat mosaic) 0.273 0.008 33.58 0.000 Table 7 shows the results of the consistency analysis. Overall the doi:10.1371/journal.pone.0069958.t006 contributors were consistent in their answers regarding the degree PLOS ONE | www.plosone.org 7 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 7. Consistency of response to degree of human impact. Disaggregation Category Average HI (%) Median HI (%) Std Dev (%) All All points 9.60 0.00 17.43 Expertise Experts 10.90 5.00 18.50 Non-experts 7.95 0.00 15.82 Land cover consistency Agree on land cover between points 7.20 0.00 14.55 Disagree on land cover between points 17.25 10.00 22.80 Confidence Sure 7.92 0.00 15.68 Sure+Quite sure 9.13 0.00 16.93 Quite sure+Less sure+Unsure 22.08 15.00 23.65 Less sure+Unsure 25.92 15.00 25.16 doi:10.1371/journal.pone.0069958.t007 of human impact, with an average deviation of less than 10% (i.e. Land Cover 9.6%) although the spread of answers was higher at 17.4%. When The overall accuracies for the three sets of control points labeled expertise was considered, non-experts had a lower average C1, C2 and C3 are presented in Table 9 for the full dataset, deviation than the experts by just under 3%. When the consistency considering only those contributions where confidence was high was extended to land cover, those pixels which showed consistent (i.e. ‘sure’ on the slider bar) and then disaggregated by expertise choices in land cover had a lower average deviation in human (i.e. experts or non-experts). impact by 8.3% compared to those which showed inconsistency in Considering all three sets of control data, accuracy varies land cover choice. This reflects pixels that were clearly more between 66 and 76%. There is little difference between the first difficult to identify. Finally, when contributors were the most and second set of controls but there is a marked increase in confident in their choice of human impact, they were also more accuracy for the final set (C3) with 76%. This is unsurprising since consistent (average deviation of 7.9%), with consistency decreasing the final control sample was drawn from high resolution imagery. as confidence decreased resulting in an average deviation of as When taking only those answers where the volunteers indicated much as 25.9% for the least confident category. This analysis of high confidence (or ‘sure’ on the slider bar), there was around a consistency serves to highlight the need to examine those pixels 3% increase in the accuracy to 69%. Unlike with human impact, which were not consistently labeled and which are probably more experts were more accurate than non-experts, e.g. 62% for non- difficult to judge in terms of both human impact and land cover, experts and 69% for experts for C1 with even larger differences observed for C2 and C3. This suggests that extra training should which can then be used to help train the volunteers. The results of the regression analyzing the effect of response be provided to those individuals with a non-expert background. As training manuals are often unread or rarely consulted, a more times are shown in Table 8 and indicate that the agreement interactive approach could be introduced such that the volunteers between the volunteers and the control pixels increased signifi- are made aware of their errors as they progress through a cantly with a faster response time for human impact, although the competition. In addition, a forum could be set up to discuss pixels effects were small. For each increase in magnitude in response that present difficulties in identification, particularly for non- time, the agreement between the crowd and the control pixels experts. increased in accuracy by 1.4%. The average deviation in human Table 10 shows the user’s and producer’s accuracies for the five impact for pixels of (4) Cultivated and managed and (5) Mosaic of most common land cover types in the dataset. Overall the results cultivated and managed/natural vegetation and high resolution imagery show that there is generally an increase in the accuracy across from the first two control sets was 17.1%. This was compared to control sets although C3 should only really be considered for the third set of control data points (consisting of only these pixel cropland and mosaic classes. The lowest accuracies are in shrub types) and the average deviation in human impact was lower, cover, grassland/herbaceous and the mosaic cropland class, which decreasing to 14.7%. A t-test confirmed that the means are significantly different from one another (p,0.0001; t =24.8533; degrees of freedom = 3326.222) and showed that accuracy in human impact actually increased at the end of the competition. Table 9. Accuracy of land cover (in %) based on comparison of volunteer response with three sets of controls. Thus, these analyses indicate that there are no particular concerns over quality in relation to response time. No allowance for confusion between classes Dataset used Table 8. Regression analysis for the model Y =a+bX +e i i i C1 C2 C3 where X is response time and Y is human impact. i i Full dataset 66.4 66.5 76.2 Confidence rating 69.4 69.3 78.9 Estimate Std. Error t value Pr(.|t|) of sure a 12.9915 1.0706 12.135 0.000 Experts 69.2 72.3 84.6 b 1.4110 0.6157 2.291 0.022 Non-experts 62.4 61.9 65.9 doi:10.1371/journal.pone.0069958.t008 doi:10.1371/journal.pone.0069958.t009 PLOS ONE | www.plosone.org 8 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data Table 10. User’s and producer’s accuracies for the five main Table 11. Consistency of response in choosing the land land cover types and for different subsets of the data cover type. including confidence and expertise. Disaggregation Category Consistent Percentage Land None Full dataset Y 76.1 cover No confusion Data set type N 23.9 User’s accuracy Producer’s accuracy Expertise Expert Y 75.7 N 24.3 C1 C2 C3 C1 C2 C3 Non-Expert Y 76.7 Full 1 75.9 77.4 43.6 67.1 69.6 100.0 N 23.3 2 52.1 46.5 0.0 61.7 67.2 N/A Confidence Sure Y 77.6 3 45.1 56.3 6.0 51.3 56.3 30.0 N 22.4 4 78.9 88.8 95.2 74.2 72.8 76.0 Quite sure+Less Y 76.4 5 71.5 68.8 64.6 62.2 60.7 76.4 sure+Unsure Sure 1 78.7 82.4 53.1 68.0 70.2 100.0 N 23.6 2 50.8 48.6 0.0 64.4 71.2 N/A Less sure+Unsure Y 66.7 3 43.9 52.4 10.7 47.7 53.7 50.0 N 33.3 4 81.0 89.6 95.2 76.5 75.0 78.7 doi:10.1371/journal.pone.0069958.t011 5 72.4 68.2 63.7 66.8 65.8 78.8 Expert 1 78.4 83.5 52.6 73.0 68.8 100.0 Similar to human impact, a further analysis was then 2 54.8 45.7 0.0 63.8 65.1 N/A undertaken on a subset of the data where the volunteers were 3 50.9 65.6 7.1 52.4 65.2 33.3 provided with the same pixels at different times in the competition 4 77.1 90.5 95.5 82.6 80.5 86.5 (Table 11). The results show that the volunteers were consistent in their response just over 76.1% of the time where this was slightly 5 76.5 75.7 78.1 59.3 71.8 80.2 lower for experts (75.7%) and slightly higher for non-experts Non- 1 71.9 73.6 35.0 58.6 70.2 100.0 (76.7%). A very minor increase to 77.6% was observed when expert considering only those pixels where the volunteer was sure but 2 48.5 47.2 0.0 58.9 68.8 N/A when the volunteers were less sure or unsure about their responses, 3 38.0 48.7 5.6 49.5 48.9 28.6 their consistency in response decreased to 66.7%. 4 82.8 87.0 94.6 61.2 66.3 63.0 The final analysis concerned the relationship between quality in 5 66.1 62.4 52.5 66.3 51.6 71.8 land cover classification and response time. The results showed that the crowd was 40% more likely to disagree with the control 1 = Tree cover; 2 = Shrub cover; 3 = Herbaceous vegetation/Grassland; 4 = for each order of magnitude increase in response time (p,.0001) Cultivated and managed; 5 = Mosaic of cultivated and managed/natural vegetation. as shown in Table 12 and indicated by the value of b. doi:10.1371/journal.pone.0069958.t010 Considering the issue of whether quality in land cover validation (and therefore accuracy) decreased near the end of the competi- indicates the need to provide more examples of how these classes tion, we compared the probability that the volunteers agreed with appear on Google Earth within the training materials as the the control pixels for land cover types (4) Cultivated and managed and volunteers are confusing these classes more often than others. (5) Mosaic of cultivated and managed/natural vegetation at the end of the When considering points where the volunteer had a high competition (75.9%) with that from the early to middle part of the confidence, the patterns are similar and there is generally an competition (70.6%). This difference was determined to be highly increase in accuracy although the mosaic cropland class continues significant (p,.0001; number of trials = 1500; number of to be more problematic, with a decrease in the user’s accuracy successes = 1139) using a binomial test and therefore the accuracy across control sets. Finally, the effect of expertise on land cover in estimating land cover actually increased in the final stages of the classification accuracy produced variable results depending upon competition. Thus for both human impact and land cover, there the land cover type and the control set considered. For the forest are no concerns about the quality decreasing near the end of the class, the non-experts improved in their ability to correctly identify competition with a faster response time. forest by the second set of controls, while the non-experts actually showed a decrease in the producer’s accuracy. Similarly, for the shrub class, the non-expert showed a greater level of improvement Table 12. Logistic regression analysis for the model Logit in the second set of controls compared to the expert and (P ) =a+bX where X is the log of the response time and P is i i i i outperformed them in terms of both user’s and producer’s the probability that the land cover is correctly identified. accuracy in C2. The experts were better than non-experts at identifying herbaceous, cropland and mosaic but once again there were differences in the user’s and producer’s accuracies. By Estimate Std. Error t value Pr(.|t|) building up a picture of where experts and non-experts have a 1.46573 0.13955 10.504 0.000 differing performance by land cover class, we can tailor the kinds b 20.40005 0.07957 25.027 0.000 of training materials provided to the volunteers, focusing on areas where greater problems in identification lie. doi:10.1371/journal.pone.0069958.t012 PLOS ONE | www.plosone.org 9 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data advantages of citizen science. For example, recent activity such as Conclusions the umbrella Zooniverse project (http://www.zooniverse.org) This paper assessed the quality of crowdsourced data collected promotes collaborative projects in many areas of social and through a Geo-Wiki competition. Volunteers identified the degree physical science research. Currently, registration to its projects of human impact and classified land cover at random locations using Google Earth images. Quality was assessed by comparing captures no information about the contributor, their training or their socio-economic context. Approaches that include informa- volunteer results with results agreed by experts at a number of control points. Control points were provided to volunteers at the tion about participant background, control points, reflection, beginning, middle and end of the competition. The results showed repetition, etc. have broad potential for other citizen science that there is little difference between experts and non-experts in projects that involve classification or identification, e.g. [24,5] identifying human impact while experts were better than non- where experts can be used to build a database of controls for experts in identifying land cover. However, the results for both monitoring and learning purposes. varied by land cover type and through the competition. For The next step in this research is to develop robust measures of example, experts were better than non-experts at identifying shrub quality for each location in the crowdsourced database based on land cover at the start of the competition but non-experts rules that take into account the number of times that contributors improved more than experts and then outperformed them in have provided information at a given location along with the shrub cover identification by the middle of the competition, consensus in the answers, their expertise and the confidence in the indicating that volunteers were learning over time. The volunteers answers provided. However, the results from this study suggest the were shown to be reasonably consistent in their characterizations need for more nuanced approaches than a simple Linus Law or of human impact and land cover with non-experts outperforming mass of evidence approach (which have been previously suggested the experts in terms of human impact and vice versa for land in this domain) for determining when to believe the crowd and cover. Moreover, when contributors were confident in their choice therefore when the information they provide can be used with of human impact, they were also more consistent, and unsurpris- confidence. Formal methods for combining evidence such as ingly, consistency decreased as confidence decreased. Finally, Bayesian probability, Dempster-Shafer theory of evidence, Possi- increased response times (as observed towards the end of the bility Theory and Endorsement theory provide different ways for competition) did not have a negative impact on quality, and combining or partitioning evidence. They allow measures of volunteers were therefore not sacrificing quality for the desire to certainty and uncertainty to be generated and provide different complete more locations and thereby win the Geo-Wiki compe- measures of confidence in aggregated information and for tition. Thus overall, the non-experts were as reliable in what they determining when the weight of evidence indicates that crowd- identified as the experts were for certain, identifiable situations, sourced data or VGI are ‘believable’. Since the relationship and the reliability of the information provided by non-experts between reliability and confidence was found to be strong in this improved faster and to a greater degree than experts. Thus, better, research, this also suggests that future activities seeking to targeted training materials and a continual learning process built incorporate crowdsourced data should capture measures of into the competition might help address these issues. Also, allowing contributor confidence in the information they provide. Ongoing volunteers to reflect on the information they contribute, for research by the authors will investigate these areas in more detail. example by regularly feeding back evaluations of their data through the use of control points or by making additional material Author Contributions available to them, would also potentially decrease differences between experts and non-experts, particularly in the classification Conceived and designed the experiments: LS SF MV CP C. Schill IM. Performed the experiments: LS SF MV CP C. Schill IM. Analyzed the of land cover. The findings of this research relating to the data: LS AC C. Salk SF MV. Wrote the paper: LS AC C. Salk SF MV CP differences between expert and non-expert citizens are also C. Schill IM FK MO. relevant to other areas of research that seek to benefit from the References 1. Goodchild MF, Li L (2012) Assuring the quality of volunteered geographic 10. Van der Velde M, See L, Fritz S, Verheijen FGA, Khabarov N, et al. (2012) Generating crop calendars with Web search data. Environmental Research information. Spatial Statistics 1: 110–120. doi:10.1016/j.spasta.2012.03.002. Letters 7: 024022. doi:10.1088/1748-9326/7/2/024022. 2. Schuurman N (2009) The new Brave NewWorld: geography, GIS, and the 11. Haklay M, Basiouka S, Antoniou V, Ather A (2010) How Many Volunteers emergence of ubiquitous mapping and data. Environment and Planning D: Does it Take to Map an Area Well? The Validity of Linus’ Law to Volunteered Society and Space 27: 571–580. Geographic Information. The Cartographic Journal 47: 315–322. 3. Miller-Rushing A, Primack R, Bonney R (2012) The history of public 12. Comber A, See L, Fritz S, Van der Velde M, Perger C, et al. (2013) Using participation in ecological research. Frontiers in Ecology and the Environment control data to determine the reliability of volunteered geographic information 10: 285–290. doi:10.1890/110278. about land cover. International Journal of Applied Earth Observation and 4. Khatib F, DiMaio F, Group FC, Group FVC, Cooper S, et al. (2011) Crystal Geoinformation 23: 37–48. doi:10.1016/j.jag.2012.11.002. structure of a monomeric retroviral protease solved by protein folding game 13. Foody GM, Boyd D (2012) Using volunteered data in land cover map validation: players. Nature Structural & Molecular Biology 18: 1175–1177. doi:10.1038/ Mapping tropical forests across West Africa 2368–2371. nsmb.2119. 14. Fritz S, McCallum I, Schill C, Perger C, Grillmayer R, et al. (2009) Geo- 5. Clery D (2011) Galaxy Zoo Volunteers Share Pain and Glory of Research. Wiki.Org: The Use of Crowdsourcing to Improve Global Land Cover. Remote Science 333: 173–175. doi:10.1126/science.333.6039.173. Sensing 1: 345–354. doi:10.3390/rs1030345. 6. Nayar A (2009) Model predicts future deforestation. Nature News. Available: 15. Fritz S, McCallum I, Schill C, Perger C, See L, et al. (2012) Geo-Wiki: An online http://www.nature.com/news/2009/091120/full/news.2009.1100.html. Ac- platform for improving global land cover. Environmental Modelling & Software cessed 11 February 2013. 31: 110–123. doi:10.1016/j.envsoft.2011.11.015. 7. Milc ˇinski G (2011) The rise of crowd-sourcing: how valuable data can we get out 16. Bonney R, Ballard H, Jordan R, McCallie E, Phillips T, et al. (2009) Public of VGI Amsterdam, Netherlands. Participation in Scientific Research: Defining the Field and Assessing its 8. Preis T, Moat HS, Stanley HE, Bishop SR (2012) Quantifying the advantage of Potential for Informal Science Education. A CAISE Inquiry Group Report. looking forward. Scientific Reports 2. Available: http://www.nature.com/ Washington DC: Center for Advancement of Informal Science Education doifinder/10.1038/srep00350. Accessed 18 May 2013. (C AISE ). Avai lable: http ://cai se. i nsci. o r g /u ploads/docs / 9. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. (2009) PPSR%20report%20FINAL.pdf. Detecting influenza epidemics using search engine query data. Nature 457: 17. Fritz S, See L (2005) Comparison of land cover maps using fuzzy agreement. 1012–1014. doi:10.1038/nature07634. International Journal of Geographical Information Science 19: 787–807. doi:10.1080/13658810500072020. PLOS ONE | www.plosone.org 10 July 2013 | Volume 8 | Issue 7 | e69958 Quality of Crowdsourced Data 18. See LM, Fritz S (2006) A method to compare and improve land cover datasets: 22. Perger C, Fritz S, See L, Schill C, Van der Velde M, et al. (2012) A campaign to application to the GLC-2000 and MODIS land cover products. IEEE collect volunteered geographic Information on land cover and human impact. Transactions on Geoscience and Remote Sensing 44: 1740–1746. In: Jekel T, Car A, Strobl J, Griesebner G, editors. GI_Forum 2012: doi:10.1109/TGRS.2006.874750. Geovisualisation, Society and Learning. Berlin/Offenbach: Herbert Wichmann 19. Fritz S, See L, McCallum I, Schill C, Obersteiner M, et al. (2011) Highlighting Verlag. 83–91. continued uncertainty in global land cover maps for the user community. 23. Lopresti D, Nagy G (2002) Issues in ground-truthing graphic documents. In: Environmental Research Letters 6: 044005. doi:10.1088/1748-9326/6/4/ Blostein D, Kwon Y-B, editors. GREC 2001. LCNS. Heidelberg: Springer, Vol. 2390. 46–66. 20. Fritz S, See L, Van der Velde M, Nalepa RA, Perger C, et al. (2013) 24. Bonter DN, Cooper CB (2012) Data validation in citizen science: a case study Downgrading recent estimates of land available for biofuel production. Environ from Project FeederWatch. Frontiers in Ecology and the Environment 10: 305– Sci Technol 47: 1688–1694. doi:10.1021/es303141h. 307. doi:10.1890/110273. 21. Theobald DM (2004) Placing exurban land-use change in a human modification framework. Frontiers in Ecology and the Environment 2: 139–144. doi:10.1890/ 1540-9295(2004)002[0139:PELCIA]2.0.CO;2. PLOS ONE | www.plosone.org 11 July 2013 | Volume 8 | Issue 7 | e69958

Journal

PLoS ONEPubmed Central

Published: Jul 31, 2013

There are no references for this article.