A quantitative global review of species population monitoring

Species monitoring, defined here as the repeated, systematic collection of data to detect long‐term changes in the populations of wild species, is a vital component of conservation practice and policy. We created a database of nearly 1200 schemes, ranging in start date from 1800 to 2018, to review spatial, temporal, taxonomic, and methodological patterns in global species monitoring. We identified monitoring schemes through standardized web searches, an online survey of stakeholders, in‐depth national searches in a sample of countries, and a review of global biodiversity databases. We estimated the total global number of monitoring schemes operating at 3300–15,000. Since 2000, there has been a sharp increase in the number of new schemes being initiated in lower‐ and middle‐income countries and in megadiverse countries, but a decrease in high‐income countries. The total number of monitoring schemes in a country and its per capita gross domestic product were strongly, positively correlated. Schemes that were active in 2018 had been running for an average of 21 years in high‐income countries, compared with 13 years in middle‐income countries and 10 years in low‐income countries. In high‐income countries, over one‐half of monitoring schemes received government funding, but this was less than one‐quarter in low‐income countries. Data collection was undertaken partly or wholly by volunteers in 37% of schemes, and such schemes covered significantly more sites and species than those undertaken by professionals alone. Birds were by far the most widely monitored taxonomic group, accounting for around half of all schemes, but this bias declined over time. Monitoring in most taxonomic groups remains sparse and uncoordinated, and most of the data generated are elusive and unlikely to feed into wider biodiversity conservation processes. These shortcomings could be addressed by, for example, creating an open global meta‐database of biodiversity monitoring schemes and enhancing capacity for species monitoring in countries with high biodiversity.

low-income countries. Data collection was undertaken partly or wholly by volunteers in 37% of schemes, and such schemes covered significantly more sites and species than those undertaken by professionals alone. Birds were by far the most widely monitored taxonomic group, accounting for around half of all schemes, but this bias declined over time. Monitoring in most taxonomic groups remains sparse and uncoordinated, and most of the data generated are elusive and unlikely to feed into wider biodiversity conservation processes. These shortcomings could be addressed by, for example, creating an open global metadatabase of biodiversity monitoring schemes and enhancing capacity for species monitoring in countries with high biodiversity. KEYWORDS biodiversity surveillance, citizen science, megadiverse countries, population trends, taxonomic bias

INTRODUCTION
Data on long-term trends in species abundance and distribution underpin efforts to track and understand the global biodiversity crisis, to target scarce conservation resources to priority species and sites, and to quantify the impact of those investments (Borges et al., 2018;Butchart et al., 2010;Díaz et al., 2019). The process of monitoring can in itself bring about and accelerate positive conservation outcomes (Danielsen et al., 2005). For these reasons and others, it has been suggested that biodiversity monitoring should be recognized as a measure of a nation's development, analogous and complementary to more widely used economic and human health indicators (Scheele et al., 2019). However, biodiversity monitoring is poorly coordinated and often haphazard in its occurrence (Pereira & Cooper, 2006) and has a number of taxonomic, regional, and methodological biases McRae et al., 2017;Schmeller et al., 2017), even within the most widely monitored groups, such as birds (Garnett & Geyle, 2018). Furthermore, schemes may not meet fundamental standards in terms of objectivity, standardization, replication, and duration (Buckland & Johnston, 2017;Lindenmayer et al., 2012;White, 2018). These caveats apply particularly to largely unstructured recording schemes that collect occurrence-only data with relatively little standardization and uneven global coverage, for which there is currently mixed evidence of their ability to track trends in abundance accurately (e.g., Boersch-Supan et al., 2019;Kamp et al., 2016). Although the sheer scale of data collection, growing emphasis on encouraging users to submit complete species lists, and increasingly sophisticated statistical analyses are overcoming these challenges (e.g., Kelling et al., 2015), the arrival of big data does not yet obviate the need for targeted and systematic species monitoring (Bayraktarov et al., 2019). Finally, it is unclear how accessible monitoring data are to end users (Schmeller et al., 2017;Stephenson et al., 2017a), and the effectiveness with which such data have been used to inform species conservation has been questioned (Lindenmayer et al., 2013;Nichols & Williams, 2006;Robinson et al., 2018). It is apparent, therefore, that current monitoring efforts will be inadequate to assess, for most species and regions of the world, progress toward major global policy initiatives, such as the Convention on Biological Diversity (CBD) Strategic Plan for Biodiversity 2011-2020 (Butchart et al., 2019;Tittensor et al., 2014), and that new approaches to biological monitoring are needed. A number of initiatives have been set up to address these problems, including the Group on Earth Observations Biodiversity Observation Network (GEO BON) (Pereira et al., 2010), the International Union for Conservation of Nature (IUCN) Species Survival Commission (SSC) Species Monitoring Specialist Group (Stephenson, 2018), and the Marine Biodiversity Observation Network (Duffy et al., 2013). However, there is currently no global overview of the monitoring of species populations to guide and inform these efforts. Thus, the degree and temporal direction of regional and taxonomic biases in coverage (and the interaction between them), the principal objectives of monitoring schemes, the main methods used, and the primary actors involved all remain little understood. We aimed to shed light on the current situation by undertaking the first global assessment of the state of biodiversity monitoring. We focused on the monitoring of species population trends. We used a number of search methods to identify schemes meeting our definition of monitoring (see Methods) and extracted and analyzed the characteristics of these schemes. This study forms part of a broader global effort by the IUCN to improve and support species monitoring (Stephenson, 2018;Stephenson & Stengel, 2020).

Definition of monitoring and monitoring schemes
The term monitoring in relation to its use in conservation lacks a clear definition and is interpreted differently in different contexts and languages. For the purposes of this study, we defined long-term species monitoring as the repeated, systematic collection of data with the intention to detect changes over time in the abundance or distribution of 1 or more predefined taxa or taxonomic groups. Our definition aims to capture schemes and data sets that are likely to inform long-term environmental agreements, such as the CBD, and deliberately excludes short-term initiatives, such as biodiversity baseline assessments, one-off inventories, environmental impact assessments, and short-term scientific studies. It also excludes initiatives that do not have or plan to have regularly repeated data collection, for instance, species distribution atlases that are repeated at long (> 5 years) and unpredictable intervals (although atlases for which data are collected on relative population sizes at frequent and regular intervals were included). We excluded monitoring of the harvesting of commercially important species, such as fisheries catches or hunting bags, because it is generally unclear whether increased catches reflect an increase in the population or an increase in effort (and hence potentially a driver of population decline) and such activities do not meet the part of our definition of monitoring that relates to intentionality. We also excluded platforms by which unstructured point data are collected, such as eBird (https://ebird.org/home) and Global Biodiversity Information Facility (https://www.gbif.org/), for the reasons given above. Although it may be possible to infer population trends from some of the excluded data sources, our aim was to explore schemes established with the intention of monitoring population trends directly.
We defined a monitoring scheme as a recognizable protocol whose aim is to collect field data on long-term population trends in 1 or more species with a predefined method. Schemes could collect data on multiple taxonomic groups simultaneously. Sometimes schemes were grouped within what we defined as a program, which might collect data on the same or different species with different methods. For example, in some cases, national biodiversity monitoring programs comprise multiple taxon-specific schemes, each of which applies a different method.

Data collection
We used 4 methods to identify schemes that appeared to meet our definition of monitoring, recognizing that these would identify suites of monitoring schemes that would differ systematically in several respects. First, we designed a questionnaire to collect metadata related to different aspects of monitoring schemes. This was disseminated to a wide and varied audience, including monitoring scheme coordinators (e.g., governments, nongovernmental organizations [NGOs], and academics) and scheme participants (e.g., conservation practitioners and members of IUCN species specialist groups). Because of the global scope of our study, and recognizing that English is not the first language of a large proportion of our target audience, we simplified the information collected and the wording used to define it. The questionnaire, the means we used to disseminate it to our target audiences, and its global uptake are detailed in Appendix S1.
Second, project partners in 7 countries conducted in-depth searches for monitoring schemes within their respective countries through their professional networks, both to increase sample sizes and to calibrate the comprehensiveness of the results obtained by the broadly disseminated questionnaire. The countries were selected to be spread widely across the world, to encompass high biodiversity, and to vary in their socioeconomic status. Partners included conservation NGOs and academic institutions in Indonesia (Burung Indonesia: www.burung. org), China (Nanjing Institute of Environmental Sciences: www.nies.org/ywz), Kazakhstan (Association for the Conservation of Biodiversity of Kazakhstan: www.acbk.kz), Ghana (Ghana Wildlife Society: www.ghanawildlifesociety.org), South Africa (BirdLife South Africa: www.birdlife.org.za), Colombia (Humboldt Biological Resources Research Institute: http:// humboldt.org.co), and Argentina (Aves Argentinas: http:// www.avesargentinas.org.ar/). Partners used the questionnaire described above, and some translated it into their national language to encourage wider participation.
Third, we searched online for monitoring schemes in 41 countries with a standardized web search (Appendix S2). We used the search strings "species monitoring [country]," "biodiversity monitoring [country]," and "wildlife monitoring [country]" in that order in Google and examined the first 100 results for each, clearing all search history and data from the browser after each search. The method was refined by first testing it on schemes in the United Kingdom. Searches were undertaken in English from the United Kingdom. The result pages did not necessarily link directly to a page detailing monitoring schemes, so complete websites were investigated to avoid missing monitoring schemes. We tried to fill in as many fields from the questionnaire with the information found, but mostly focused on geographical and taxonomic metadata. The 41 countries selected for the search included the 7 countries targeted for in-country searches by partners, as described above, to obtain a measure of detectability, all 17 megadiverse countries (Mittermeier et al., 1997), all of the world's 10 largest countries, and a number of randomly selected countries in regions underrepresented by other methods (Appendix S2).
Fourth, we trawled the database of the EuMon project, which conducted a review of biodiversity monitoring in the European Union (EU) from 2005 to 2007 (http:// eumon.ckff.si/) (Schmeller et al., 2006), and the Living Planet Database, a global repository for vertebrate population trends (www.livingplanetindex.org) (McRae et al., 2017). We identified schemes that fitted our definition of monitoring and extracted the relevant metadata from them, contacting scheme organizers for further information where necessary. The EuMon database is incomplete, even for well-known groups such as birds (Voříšek et al., 2018), but comparison with independent searches of the literature suggests that it may reflect a reasonably unbiased sample of the schemes available, at least for some taxonomic groups (Schmeller et al., 2009). The Living Planet Database is likely to contain taxonomic and geographic biases in the monitoring schemes it records (McRae et al., 2017); however, being global in scope, this is likely to be an indication of the type of monitoring schemes currently available for vertebrate species. Because there were few schemes to include from this database additional to those recorded by the questionnaire, they were added to those from the questionnaire.
Data on per capita gross domestic product (GDP) for each country, corrected for purchasing power parity (PPP), were downloaded from the CIA World Factbook (https://www. cia.gov/library/publications/the-world-factbook/). These data were used in the modeling processes described below.

Data processing and cleaning
Using the methods described above, we identified and compiled data on a large number of monitoring schemes or surveys. We requested respondents to complete 1 questionnaire per scheme. In practice, respondents interpreted the instructions in different ways and adapted them to how they practiced monitoring activities. For instance, some respondents pooled all monitoring activities into 1 questionnaire, whereas others completed the questionnaire for each species they monitored; some described a general monitoring program applied to multiple sites (e.g., national parks), whereas others entered each monitored site as a different scheme or survey. The raw data set was, therefore, likely to result in over-or underrepresentation of some countries, regions, and taxonomic groups. To reduce such effects, we identified nested schemes and surveys and either pooled them or, less commonly, separated them. For instance, where a respondent reported each national-scale single species monitoring scheme separately, we pooled into a single scheme all those recording the same taxonomic group at the same frequency and reporting the same type of results.
We classified each scheme for which we received data into 3 categories: those that clearly did not fit our definition of monitoring, those that clearly did, and those for which information was insufficient to decide unequivocally (Appendix S3). We also assessed the national-level reporting rate of 5 international monitoring schemes and initiatives: the International Waterbird Census (https://www.wetlands. org/our-approach/healthy-wetland-nature/internationalwaterbird-census/), the Pan-European Common Bird Monitoring Scheme (https://www.ebcc.info/ pecbms/), the European Butterfly Indicator (https: //www.eea.europa.eu/data-and-maps/figures/europeangrassland-butterfly-indicator), the Tropical Ecological and Monitoring Network (https://www. conservation.org/projects/team-network), and the Global Observation Research Initiative in Alpine Environments (Grabherr et al., 2000).
In the few cases where multiple years were given for the start of a program, for instance to give information on when extra sites or taxonomic groups were added to an existing scheme, we applied the initial start date to all schemes in the program. In the small number of cases where the precise start year was unknown and reported at the nearest decade (e.g., 1990s), we allocated the start year to the midpoint of that decade.

Assessing data detectability and representativeness
We assessed the ability of our search methods to detect monitoring schemes by looking at the degree of duplication between the different data collection protocols; extent to which the national components of known international schemes were identified; and accumulation rate of monitoring schemes during the standardized web searches. This was done to assess the degree to which our data set was comprehensive and representative and to assess the likely accessibility and visibility of monitoring data to those wishing to use them for conservation purposes. We expected the temporal, spatial, taxonomic, and methodological characteristics of the schemes identified by each of our 4 data collection protocols to differ systematically for a number of reasons, not least because 3 of the 4 data-collection protocols were restricted to subsets of countries that differed greatly in terms of their geography, history, and socioeconomic status. To quantify the likely biases in our combined sample that arose from the different search methods used, we compared a number of characteristics of the schemes identified by our different data collection methods with hierarchical partitioning with the hier.part package (Walsh & Mac Nally, 2013) in R (R Core Team, 2019). This allowed us to quantify the unique contribution of search method per se in explaining variation in scheme start date, taxonomic scope (which we simplified for this analysis to a binary birds or not birds variable because around 50% of all schemes monitor birds), and scale of the survey (number of species monitored). Each of these factors was modeled as a function of the others, of region (Table 1), and of the 2018 per capita GDP of the country (corrected for PPP). The last 2 predictors were included because the main differences in focus of the search methods were geographical (e.g., EuMon includes only schemes in the EU, whereas all our in-country assessments and most of our standardized web searches covered countries outside the EU) and because GDP explains significant variation in conservation effort (e.g., Baynham-Herd et al., 2018). This analysis thus assessed whether search method per se explained unique variance in monitoring scheme characteristics or whether the systematic differences between schemes identified by different search methods could be better explained by other factors. To assess whether there has been increased government investment in monitoring in response to the first (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) or second (2011-2020) CBD Strategic Plans, we compared the proportion of schemes receiving government funding before 2002, in 2002-2010, and after 2010.

Estimating the number of schemes globally
We attempted to assess the total global number of species monitoring schemes in 2 ways. First, we assumed the range of national-level reporting rates of the national components of 5 international monitoring schemes reflected our detection rate of monitoring schemes generally, yielding a bootstrapped estimate. We then derived a second and independent estimate based on the highly significant relationship between the number of monitoring schemes located by the systematic online searches of 41 countries and their national per capita GDP. We used a regression of the log of the number of schemes found by such searches on GDP to estimate the number of schemes that may have been found if all the world's countries had been searched in the same way, based on their GDP. For each country, we derived a minimum estimate (intercept minus its standard error and the lower 95% estimate of the slope) and a maximum estimate (intercept plus its standard error and the upper 95% Table 1 Number of schemes monitoring different taxonomic groups by region  (1552) is larger than the total number of schemes included in the database (1168) because some schemes monitor more than 1 taxonomic group. b Cell counts significantly lower than expected. c Cell counts significantly higher than expected. estimate of the slope). We then randomly selected a value between the minimum and maximum estimate for each country and multiplied this by 9.35. This is the reciprocal of the mean proportion (0.107) of all schemes that were detected through the standardized search method across the 7 countries in which in-country assessments were undertaken (Appendix S4) and thus are likely to have had a high proportion of their schemes detected. Values were then summed across countries. This randomization was repeated 999 times and the ranked 25th and 975th percentiles were taken as the bootstrapped 95% confidence intervals of the median value.

RESULTS
After removing duplicates, pooling or separating surveys according to the protocol described above, and removing schemes we considered not to meet our strict definition, we obtained a data set of 1168 monitoring schemes, of which 958 unambiguously met our definition of monitoring and the remaining 210 did so with a degree of uncertainty, in most cases because it was unclear whether the schemes were intended to become long-term monitoring or were shorter-term baseline assessments. The significantly later median start date of such schemes compared with those unambiguously meeting our criteria (2006 vs. 2000; Mann-Whitney U test, p < 0.001) reflected this uncertainty. Because such schemes comprised only 18% of all returns, including them with the 958 schemes that unambiguously met our criteria did not systematically change the taxonomic or spatial composition of the combined data (χ 2 tests, p > 0.5 in all cases) and excluding them might bias temporal analyses, we merged the 2 sets of schemes in analyses unless otherwise stated. Of the 1168 schemes identified, 760 were independent of other schemes and 408 were grouped in 91 broader monitoring programs (median 3 schemes per program [see definitions above]). Because schemes within programs by definition used different methods, generally covered different taxonomic groups, and generally had different start years (of the 54 programs with 3 or more schemes, in only 16 did all schemes cover the same broad taxonomic group and in only 16 did all schemes start in the same year), we used scheme as the basic unit of analysis, accepting that there would be a small degree of nonindependence.

Data detectability and representativeness
Of the 1168 schemes identified, 417 (35.7%) were detected through the online questionnaire, 196 (16.8%) were extracted from EuMon, 226 (19.3%) were reported by the 7 in-country partners, and 329 (28.2%) were found by the standardized online searches of 41 countries. The complete data set included 58 schemes that were independently detected by 2 different search methods, 11 that were detected by 3, and 2 that were detected by all 4. Thus, only 71 (6.1%) schemes were detected by more than 1 search method, reflecting the fact that 3 of the 4 search methods had mutually exclusive, or only partly overlapping, geographical foci. In the 7 countries in which we undertook targeted in-country searches as well as standardized web searches and general questionnaire outreach, there was much variation in the contribution of the different data collection methods to the total number of schemes obtained. However, the degree of overlap was generally higher than the overall rate of 6.1% (mean replicated schemes across methods 10.2%, range 0.0-26.3%) (Appendix S4). On average, 28.2% of the national-level components of international monitoring schemes were reported through 1 or more of our search methods (range 7.7-48.0%) (Appendix S4). Schemes in European countries had higher reporting rates (range 35.9-48.0%) than schemes in the rest of the world (range 7.7-28.6%). The low degree of overlap of schemes between search methods, which largely reflected the different geographical extent of the searches, does not necessarily indicate a low detection rate in search methods. For example, for 2 countries (United Kingdom & South Africa), we extended the standardized online search beyond the first 100 web hits to a maximum of 500, but this increased the number of returns by <10% (Appendix S2).
As expected, the 4 search methods identified samples of schemes that differed systematically. Year of initiation and the number of species monitored, for example, both differed systematically between schemes identified by each of the 4 methods (Kruskall-Wallis test, p < 0.0001; post hoc Dunn's test indicated significant differences among all 4 search methods at p < 0.05), as did the proportion of schemes that monitored birds (χ 2 3 = 12.3, p < 0.01). However, the results of hierarchical partitioning analysis of variance across all schemes suggested that the unique contribution of search method per se in explaining these systematic differences was small or insignificant (3.0-17.2% of explained variance) compared with the contribution of the other correlates (Appendix S5). The systematic differences between schemes identified by different search methods could, therefore, be largely explained by patterns that were consistent across the pooled sample. Thus, although the different search methods sampled different parts of the spectrum of monitoring schemes globally, they were complementary and selected representative and comparable samples across that spectrum. However, sampling intensity is likely to have differed between search methods and where this might influence interpretation of the results, we present data for each search method separately.

Number of monitoring schemes globally
We assumed that the range of national-level reporting rates of the national components of 5 international monitoring schemes (7.7-48.0%) reflected our detection rate of monitoring schemes generally and thus that our total of 1168 schemes were underestimated by a factor from 2.08 to 12.99. This yielded a bootstrapped estimate of 9240 schemes globally (bootstrapped 95% CL 3305-14,997). We then derived a second and independent estimate using the highly significant relationship between the number of monitoring schemes located by the systematic online searches of 41 countries and their national per capita GDP. A few very small territories with exceptionally high per capita GDP (Qatar, Macao, Singapore, Hong Kong, Brunei, Kuwait, and United Arab Emirates) were removed because these yielded unrealistically high estimates, whereas very few if any schemes were detected in these territories. This method estimated a global total that fell within the top end of the range estimated by the previous method; median bootstrapped estimate was 12,249 schemes (95% CL 11,127-13,314). We assessed the fit of this approach by comparing the total number of schemes recorded in the 7 countries in which in-country assessments were undertaken with the modeled estimates for those countries and found that the 2 aligned well (Appendix S6).

Taxonomic, temporal, and spatial patterns
All 4 search methods indicated a general increase in the rate of initiation of new monitoring schemes over time (Appendix S7). Across the combined sample, there was evidence of an increase in the establishment of new monitoring schemes in low-and middle-income countries and in megadiverse countries, but a fall in the rate of increase of new schemes in high-income countries ( Figure 1a). As a result of these different temporal trends and the larger number of schemes in higher income countries, schemes in high-income countries that were still active in 2018 generated longer runs of data on average than schemes in lower income countries (Kruskall-Wallis test, χ 2 3 = 90.1, p < 0.0001) (Figure 1b).
For monitoring schemes identified by the standardized online search, there was a highly significant positive correlation between the number of schemes detected in a country and its per capita GDP (r 36 = 0.57, p < 0.001; Figure 2). There was no systematic trend over time in the number of species being monitored by newly established monitoring schemes (r 847 = 0.06, p > 0.05). There was a highly significant difference between broad taxonomic groups in the median number of species monitored per scheme (Kruskall-Wallis test, χ 2 8 = 192.1, p < 0.0001); schemes monitoring reptiles, mammals, and amphibians monitoring fewer species on average than those monitoring birds, insects, and plants (Appendix S7).
Birds and mammals were the most frequently monitored taxonomic groups in our database across all search methods (Appendix S7) and within each (Figure 3a), although there was evidence of an increase over time in the proportion of newly established schemes that monitor nonavian taxa (Figure 3b). Nevertheless, highly speciose groups, such as plants and many invertebrate taxa, remained very poorly covered by monitoring in most regions of the world. There was highly significant regional variation in the taxonomic focus of monitoring schemes (χ 2 70 = 277.6, p < 0.0001) ( Table 1). Analysis of residuals in a contingency table indicated that this was driven largely by a higher than expected number of insect monitoring schemes and a lower than expected number of reptile monitoring schemes in Europe, a higher than expected number of amphibian monitoring schemes in North America, a higher than expected number of mammal and reptile monitoring schemes Figure 1 (a) Temporal trend in initiation of new species monitoring schemes by country income category following World Bank definitions and for megadiverse countries (all income classes combined) and (b) number of years schemes still active in 2018 had been running by that year (black diamond, mean; thick horizontal bar, median; box, interquartile range; dots, outliers). For display purposes, a small number of schemes that had been running for over 50 years are omitted in SE Asia, a higher than expected number of bird monitoring schemes in Antarctica and E Asia, and a higher than expected number of multiregion schemes that monitor plants and other taxonomic groups (Table 1).

Funding, operation, and aims of monitoring schemes
There was a highly significant association between a country's income bracket (following World Bank categories) and the funding and organizational models of the monitoring schemes operating there. In high-income countries, over 50% of monitoring schemes were partly or wholly funded by government, falling to around 35% in upper-middle income countries and 20% in lower-middle and low-income countries (the last 2 were combined due to small sample size for low-income countries), where funding by NGOs was more prevalent (χ 2 12 = 79.2, p < 0.0001) (Figure 4a). In high-income countries, there was evidence of a marginal increase in the proportion of schemes funded by government after 2002 (year of the first CBD Strategic Plan), but in other country income brackets and across all countries combined, there was a decline over time in the proportion of new schemes that received government funding (Figure 4b).
Differences also existed between country income brackets in the proportion of schemes in which data collection was undertaken by nonprofessionals (variously described as "amateurs," "volunteers," and "local communities"). Collaborations between professionals and nonprofessionals were more prevalent in low and lower-middle income countries (χ 2 4 = 73.1, p < 0.0001) (Appendix S7). Across all schemes, 37.0% involved nonprofessionals in data collection, either exclusively or working with professionals. Schemes in which data collection was undertaken partly or wholly by nonprofessionals covered significantly more sites (Kruskall-Wallis test, χ 2 1 = 20.7, p < 0.0001) Figure 2 Relationship (SE) between the number of species monitoring schemes recorded in a standardized online search protocol in each of 41 countries and the country's per capita gross domestic product (GDP) corrected for purchasing power parity and more species (χ 2 1 = 4.0, p < 0.05) than those undertaken by professionals alone, and there was no significant difference in the number of taxonomic groups covered (χ 2 1 = 0.17, p > 0.5). Of the 794 monitoring schemes for which data were available, the stated aims of over half were site and species management, research, and tracking wider environmental changes (Appendix S7).

Monitoring methods
Considering the 956 schemes for which information was available on the type of results produced, 434 (45.3%) produced data on population trends only, 380 (39.7%) produced data on population sizes and trends, 93 (9.7%) collected data only on trends in distribution, and 49 (5.1%) collected data on trends in species diversity. Across the whole sample, most monitoring schemes (80.2%) collected data on at least an annual basis; 4.5% of schemes collected data once every 2 years, 11.5% had 3-5 years between repeat data collection, and 3.8% had over 5 years between repeat data collection. There were significant differences in monitoring periodicity between taxonomic groups (χ 2 24 = 120.5, p < 0.0001), due largely to the lower than expected proportion of plant monitoring schemes and the higher than expected proportion of amphibian and bird monitoring schemes that sampled on an annual basis. There was no evidence of an increase over time in the temporal resolution of monitoring schemes (measured as the proportion of all schemes that undertake assessments on an annual or more frequent basis).
For the 733 schemes for which information on sample site selection was available, 535 (73.0%) used preselected sample plots, often prescribed by the scope of the project (e.g., in schemes focused on a single site), whereas in 149 (20.3%), participants had a degree of choice in sample site selection, following scheme guidance, and in 49 (6.7%), observers had free choice of where to sample. For the 858 schemes for which information on repeat sampling was available, in 736 (85.5%), all or most sample points were revisited on consecutive surveys, in 104 (12.1%), at least some sample points were revisited, and in 18 (2.1%), repeat visits to the same sites were rare. Of the 641 schemes for which information was provided or gathered on what data were collected in addition to monitoring data, 352 (54.9%) collected additional data on habitat, 314 (49.0%) collected data on threats, 171 (26.7%) recorded survey effort, 126 (19.7%) recorded conservation action, and 100 (15.6%) collected data on demography (e.g., productivity). Taxonomic focus of biodiversity monitoring schemes: (a) total number of species monitoring schemes for taxonomic groups across all schemes (individual schemes can cover more than 1 taxonomic group, so combined total is greater than the sample size of 1168 schemes) and (b) change in proportion of bird monitoring schemes over time (black, birds only; gray, birds and other groups; white, nonavian taxa only; numbers above bars, sample size, which sum to <1168 because a scheme's start year was not always known)

DISCUSSION
Our review shows that the number of species monitoring schemes has increased over time, across all country income brackets, and in-as well as outside megadiverse countries. In the most recent decades, the growth rate of new schemes has been greater in low-and middle-income countries than in high-income countries, where there is evidence of a decrease in the rate of initiating new schemes that could not be explained by our search method. Because newly started schemes may be less detectable through our search methods than more established and hence perhaps better known schemes, this increase may be underestimated, although schemes that started early and have since ceased may also have low detectability. The apparent decline in initiating new schemes in high-income countries may represent a process of saturation there, at least in terms of bird monitoring, which accounts for a high proportion of all schemes. There was no indication that the recorded increase in monitoring since 2000 is related to growing financial support from governments in response to the first and second CBD Strategic Plans because the proportion of schemes funded by governments declined in all but high-income countries.
Our 2 estimates of the total number of monitoring schemes globally (3305-14,997 and 11,127-13,314, with median estimates of 9240 and 12,249 schemes, respectively) are both based on several assumptions that are difficult or impossible to test, and both must be treated as tentative. In particular, the different detectability of schemes between countries was impossible to assess. However, the fact that the 2 independent methods converge well, with the range of one falling entirely within that of the other, and that validation indicated a reasonable performance of the models (Appendix S6), suggests that an estimate in the order of 3300-15,000 may be a reasonable assessment of the total number of species monitoring schemes operating globally.
Even if the number of schemes operating globally is considerably in excess of our estimate, it is clear that many taxonomic groups are very poorly monitored. The bias of monitoring data toward certain vertebrate taxa, particularly charismatic species, and toward developed countries reflects a similar bias across biodiversity information more generally (Amano et al., 2016;Beck et al., 2014;McRae et al., 2017;Scheele et al., 2019;Troudet et al., 2017). Even in Australia, a megadiverse country with a high GDP, only a small proportion of the most threatened species are adequately monitored (Scheele et al., 2019). Some highly speciose groups, such as insects, with an estimated 5.5 million species (Stork, 2018), and plants, with an estimated 400,000 species (Willis, 2017), were particularly underrepresented in monitoring. Assuming that the schemes in our database present an accurate reflection of the taxonomic dis-tribution of global monitoring, but underestimate the total by a factor of 10 (see above), then plants would require around 200,000 monitoring schemes and insects 2.8 million monitoring schemes to reach the same (highly incomplete) degree of monitoring coverage that birds currently receive, in terms of number of schemes per species in the group. This comparison does not take into account the fact that some bird species are monitored by multiple schemes, whereas many are monitored by none. Viewed in this light, trends in the populations of the planet's estimated 10 million eukaryotic species are, to a first approximation, unknown.
Although it is unrealistic to expect that anything more than a tiny proportion of the species in such groups will ever be monitored systematically, there is clearly a need to expand understanding of the trends in their populations by increasing the taxonomic scope of monitoring in a way that captures a greater variety of species, life histories, environments, and ecologies in a representative fashion (Borges et al., 2018). This could be achieved by mobilizing the resources of citizen science, the mass collection of data by amateurs and volunteers (Amano et al., 2016;Schmeller et al., 2009;Stephenson et al., 2017b).
Our results indicate that schemes that use nonprofessionals to collect data cover more sites and species than those in which data are collected entirely by professionals. Involvement of local people in environmental monitoring speeds decision making and enhances management responses (Danielsen et al., 2010) and produces results that do not differ substantively from those collected by professionals (Danielsen et al., 2014). There are many examples around the world of citizen science projects involving small numbers of dedicated volunteers that have developed into long-term monitoring schemes. Monitoring could be further enhanced by designing protocols that are more efficient, for example, by closer consideration of the attributes of, and threats to, the target species and more efficient selection of model or indicator taxa and sampling sites (Bal et al., 2018;Borges et al., 2018;Weiser et al., 2020).
Structures will need to be developed to support a global expansion of monitoring in terms of funding, the development of common standards and protocols, the establishment of links between scientists and nonprofessional data collectors, and the flow of information from monitoring to policy (Schmeller et al., 2015). Barriers to data sharing by scientists will also need to be broken down (Tenopir et al., 2011). Although 76% of schemes in our study that answered the question about data availability reported that their data were available for external use, only around half of schemes reported either way. It is possible that the half not responding were less likely to make their data externally available. The use of remote sensing, which can be used to develop proxy indices of abundance by tracking environmental changes in species' ranges, offers many opportunities for improving understanding of the health of the planet's species (e.g., Leidner & Buchanan, 2018;Luque et al., 2018; Stephenson, 2019), but its resolution may be low for groups such as plants and insects, which are likely to respond to local and relatively small-scale environmental changes. Other technological advances that could facilitate the expansion of monitoring include ground-based and aerial-based sensors, such as camera traps, acoustic recording devices and drones (Deichmann et al., 2017;Rovero & Zimmermann, 2016;Wich & Koh, 2018), and environmental DNA in aquatic systems and soils (Valentini et al., 2016). Such methods often have considerable benefits in terms of standardization (e.g., they are not biased by the ability of observers to identify sightings to species), efficiency of data collection (e.g., sensors can be left in the field to collect data for days or weeks), and accessibility (e.g., drones can access areas inaccessible to people) (Stephenson, 2020). Carefully assessed surrogates of animal abundance, such as counts of droppings, may also reduce fieldwork effort (Sato et al., 2019). Although each approach has its own limitations and taxonomic biases, if used correctly as part of standardized protocols with clear goals, they can help improve the efficiency and effectiveness of biodiversity monitoring schemes (Stephenson, 2019). However, many of the taxonomic gaps we found are for small animal species and plants, for which monitoring still requires the presence of people on the ground (Stephenson et al., 2015). Our findings, therefore, underline the need to develop capacity for monitoring where it is most needed, generally in high-biodiversity countries (Schmeller et al., 2017;Stephenson et al., 2017a;Stephenson et al., 2017b).
Our review paints a picture of a global monitoring landscape that is expanding rapidly in scale and, less rapidly, in taxonomic scope, particularly in low-and middle-income countries. It is encouraging that most schemes have a high temporal resolution (with annual data collection the norm), that most use a monitoring protocol that systematically resamples predetermined sites, and that a high proportion also collect data on threats, habitats, and conservation action. However, we also found that global species monitoring generates data that are usually not centralized or coordinated, continues to rely in large part on professional data collection, and still shows a strong bias toward a small number of terrestrial vertebrate classes in high-income countries. Despite spending several months trying to collect data on as large a sample of monitoring schemes as possible, using a variety of search methods, we estimate that our sample of nearly 1200 schemes comprises only 8-36% of the schemes globally. Our in-country assessments, undertaken by established experts, failed to locate a number of schemes that were reported to us by questionnaire respondents. Monitoring schemes are, therefore, often difficult to detect, even in the countries in which they operate, and the data they generate are likely to be considerably more elusive. This may be particularly true in lower-income countries, which are often those holding the highest biodiversity, where there may be fewer resources available to raise the profile of schemes and where the lower level of funding by governments may place fewer requirements on scheme organizers to make their results visible and available. This means that the collation of data from disparate monitoring schemes into a centralized database is unlikely to be a realistic proposition in the near future.
However, we suggest that the establishment of a centralized and freely accessible database of meta-information on global monitoring schemes, building on our database and on regional repositories of monitoring information such as the Pan-European Common Bird Monitoring Scheme (https:// pecbms.info/) and the EuMon integrated Biodiversity Monitoring & Assessment Tool (http://eumon.ckff.si/biomat/1. 2.php), would be an important step in bringing schemes that currently operate in isolation into a more strategic and collaborative framework. We envision a system that can be queried spatially, temporally, and taxonomically and that provides sufficient motivation to scheme organizers to participate. Motivation may be in the form of grant-linked support or as archiving requirements linked to data publishing, analogous to established procedures in other fields. Incentives might include increased visibility, improved exchange with other monitoring schemes, or the potential for new research. Such a meta-database might promote greater collaboration among currently rather disparate operators, encourage the improved sharing of data and monitoring methods, aid the development and adoption of best practice, and bring schemes to the attention of potential funders. This in turn may stimulate the expansion of monitoring to more countries and more taxonomic groups and the development of monitoring capacity where it is most needed to fill data gaps and provide information of use to decision makers. Ultimately, such an effort would support national contributions to global processes, such as the CBD, the Sustainable Development Goals, and other multilateral environmental agreements. As a first step, the results of this study will be available on the website of IUCN SSC Species Monitoring Specialist Group (www.speciesmonitoring. org), alongside other databases of relevance to biodiversity monitoring (Stephenson & Stengel, 2020). The project partners will then explore options to develop, promote, and maintain an up-to-date georeferenced database of monitoring projects in the longer term.

ACKNOWLEDGMENTS
This work was funded by the CCI Collaborative Fund: Arcadia-a charitable fund of Lisbet Rausing and Peter Baldwin, the Rothschild Foundation, the A.G. Leventis Foundation, the Isaac Newton Trust, and the Prince Albert II of Monaco Foundation. We thank F. Danielsen, W. Foden, N. Kingston, K. Lee-Brooks, P. McGowan, T. Mundkur, L. Navarro, R. Smith, and V. Wilkins for contributing to a project start-up meeting that helped define the parameters of the study and the definition of monitoring adopted. We are very grateful to the many people who contributed information to this study. We thank 6 anonymous reviewers and the handling editor for comments that helped to improve the paper.