Effects of site‐selection bias on estimates of biodiversity change

Estimates of biodiversity change are essential for the management and conservation of ecosystems. Accurate estimates rely on selecting representative sites, but monitoring often focuses on sites of special interest. How such site‐selection biases influence estimates of biodiversity change is largely unknown. Site‐selection bias potentially occurs across four major sources of biodiversity data, decreasing in likelihood from citizen science, museums, national park monitoring, and academic research. We defined site‐selection bias as a preference for sites that are either densely populated (i.e., abundance bias) or species rich (i.e., richness bias). We simulated biodiversity change in a virtual landscape and tracked the observed biodiversity at a sampled site. The site was selected either randomly or with a site‐selection bias. We used a simple spatially resolved, individual‐based model to predict the movement or dispersal of individuals in and out of the chosen sampling site. Site‐selection bias exaggerated estimates of biodiversity loss in sites selected with a bias by on average 300–400% compared with randomly selected sites. Based on our simulations, site‐selection bias resulted in positive trends being estimated as negative trends: richness increase was estimated as 0.1 in randomly selected sites, whereas sites selected with a bias showed a richness change of −0.1 to −0.2 on average. Thus, site‐selection bias may falsely indicate decreases in biodiversity. We varied sampling design and characteristics of the species and found that site‐selection biases were strongest in short time series, for small grains, organisms with low dispersal ability, large regional species pools, and strong spatial aggregation. Based on these findings, to minimize site‐selection bias, we recommend use of systematic site‐selection schemes; maximizing sampling area; calculating biodiversity measures cumulatively across plots; and use of biodiversity measures that are less sensitive to rare species, such as the effective number of species. Awareness of the potential impact of site‐selection bias is needed for biodiversity monitoring, the design of new studies on biodiversity change, and the interpretation of existing data.


Introduction
On the global scale, species extinctions have increased as a result of human impacts (Pimm et al. 2014;Ceballos et al. 2015), and populations of many groups appear to be in rapid decline (Dirzo et al. 2014;Díaz et al. 2019). Less clear, however, is how local-scale biodiversity is changing. Although species richness is declining in many locations (Murphy & Romanuk 2014;Newbold et al. 2015), this is by no means universal, and several syntheses show considerable variation, with richness gains and losses being relatively equal (Vellend et al. 2013;Dornelas et al. 2014;Elahi et al. 2015;Blowes et al. 2019).
However, these estimates can be confounded by sampling biases that influence data availability and analyses that are possible (Gonzalez et al. 2016), including overor underrepresentation of geographic regions, land-use types, and taxonomic groups (Martin et al. 2012;McRae et al. 2017).
In the context of population changes, an important bias to consider is site-selection bias (Pechmann et al. 1991;Palmer 1993;Skelly et al. 2003), whereby sampling occurs where a focal species is present, abundant, or both. As a result, regression to the mean makes detecting declines more likely (Palmer 1993). Although siteselection bias and potential solutions are well-known at the population level, less is known about how biases translate to community-level trends (Fournier et al. 2019).
Biodiversity trend estimates differ fundamentally from population trend estimates for a number of reasons. For example, changes in population abundance need not correlate with changes in the summed abundance of all species within a community. Instead, species whose abundances are declining are often compensated for by species whose abundances are increasing (e.g., Dornelas et al. 2019). Even when there are dramatic changes in the abundances of an assemblage of species, this need not translate into changes in its diversity, and vice versa (Chase et al. 2018). Within a single assemblage of species, biodiversity trends often depend on the grain and extent at which biodiversity is estimated (Chase et al. 2019). Therefore, we asked whether site-selection bias can confound biodiversity estimates and which factors determine bias strength.
There are at least 2 classes of bias that can influence biodiversity time series. An abundance bias reflects a preference for monitoring sites that initially have a high total number of individuals. A richness bias reflects a preference for monitoring sites that initially have a high number of species (Fig. 1). There are 3 main reasons to choose rich or densely populated sites within a focal habitat to survey biodiversity. First, such sites allow for time-and cost-efficient sampling of a large variety of species (Battersby & Greenwood 2004). Second, survey data are often collected to answer multiple questions, many of which require the presence of specific species (e.g. surveillance of breeding success). Third, scientists are not always objective when it comes to nature (e.g., beauty bias [Kovacs et al. 2006] can lead to more sampling of high biodiversity sites).
We assessed the likelihood of site-selection bias across sources of biodiversity data; how site-selection bias influences biodiversity change estimates, and how the strength of bias depends on sampling design and species characteristics. We first assessed the likelihood of siteselection bias in four major sources of biodiversity data (citizen science, museum data, national parks, and academic data) through a literature review. We sought statements indicating how likely systematic site-selection schemes were because they preclude a preference for sites based on desirable qualities (Olsen et al. 1999).
We assumed site-selection bias was more likely in data sources that rely on subjective site selection. We illustrated the potential impact of site-selection bias by simulating biodiversity change in a virtual landscape. Simulations are uniquely suited for studying the impact of sampling design and systematic biases (Rhodes & Jonzén 2011; White 2019) because they allow for comparisons with a known trend. We compared the influence of 3 site-selection strategies (random, abundance biased, and richness biased) on estimates of biodiversity change. We also assessed whether bias strength depends on study design (grain size, sampling duration) and other characteristics of the focal species (dispersal ability, species pool size, aggregation).

Literature Review
To evaluate the likelihood of site-selection biases we examined four sources of biodiversity data: citizen science, museums, national parks, and academic research. We ranked the likelihood of bias based on how likely systematic site-selection schemes are applied in these 4 data sources. We determined this based on statements from published studies (i.e., a nonsystematic, qualitative literature search; for references, see Results). We supplemented this qualitative ranking with a quantitative assessment of monitoring protocols from citizen science and national parks.
To assess the likelihood of site-selection bias in citizen science data quantitatively, we examined the instruction manuals of 44 citizen science programs (Supporting Information). Programs were found using a Google search with the terms "citizen science and biodiversity or monitoring," Wikipedia's list of citizen science projects (en.wikipedia.org/wiki/List_of_citizen_science_projects), and the list of citizen science projects of the State Wide Integrated Flora and Fauna Teams (www.swifft.net. au/cb_pages/citizen_science.php). We sought representative (rather than comprehensive) data of this type. About half of the examined programs have been cited in scientific publications, according to the Global Biodiversity Information Facility (GBIF 2019). We identified 3 sampling strategies in the manuals: opportunistic sampling (i.e., casual reporting of species sightings), free site selection (i.e., subjective selection by the recorder), and designation of sites by the program organizer. We assumed that opportunistic sampling produces biased data; often participants aim to build a large collection of encountered species (e.g., eBird [Sullivan et al. 2009], iNaturalist [iNaturalist 2019]), meaning common species are likely not reported as often as they are encountered and reporting focuses on rare species. We assumed that free site selection schemes produce biased data, unless training is required before participation or an explicit statement of the need for representative sampling is included in the instruction manuals. We assumed that site designation by the program organizer is more likely to follow a systematic scheme and thus less biased.
To assess the likelihood of site-selection bias in protected areas, we reviewed 16 U.S. national park monitoring protocols (Supporting Information) because they commonly monitor biodiversity, are well documented, and readily accessible. We considered data unbiased if sites were exclusively selected according to a systematic scheme and did not include legacy sites (sites established in the past to monitor pristine, unique, or threatened areas). There are good reasons to monitor biodiversity at legacy sites; nonetheless, it potentially introduces a site-selection bias. Protocols were also classified as potentially biased if the site-selection scheme was designed to maximize sampling efficiency (e.g., sites selected to compile a complete species inventory).

Simulation Approach
We simulated sampling functionally similar species in a virtual landscape. A fixed number of individuals (∼17,000) moved or dispersed randomly across a grid of 100 × 100 cells, which was subjected to 1 of 3 richness change regimes: no change, richness decrease, or richness increase. To illustrate the potential effects of site-selection bias, we sampled the modeled landscape, either randomly or with a site-selection bias (Fig. 1). The model was not designed to accurately represent the highly complex natural world, but simply to study the potential influence of site selection on estimates of biodiversity change. Our simulations are most applicable to guilds of functionally similar species that do not strongly interact. We chose a spatially resolved individualbased model, instead of a population model (Rhodes & Jonzén 2011;White 2019), to allow for the movement or dispersal of individuals in and out of the selected sampling site. Therefore, in contrast to the work by Fournier et al. (2019) on site-selection bias in a population context, our simulation included spatial autocorrelation, where changes in the sampling site depended on the surrounding landscape. The simulations were performed using MATLAB R2017b (MathWorks 2017). All results were based on 10,000 independent simulations per The regional (landscape-scale) species abundance distribution followed a Poisson log normal distribution, with 100 species, mean log abundance of 4, and an SD σ = 1.5 (Supporting Information). The resulting community had on average ∼16,800 individuals (SD 4,800), and a typical sample (0.25% of total area) had on average 42 individuals (SD 21) (Supporting Information).
The landscape consisted of a 100 × 100 grid of cells, representing discrete positions of individuals. At the start of the simulation, individuals were placed randomly in the virtual landscape. Individual movement was simulated via a random walk. At each time step, individuals had an 8 in 9 chance to move to 1 of the 8 neighboring fields and a 1 in 9 chance to stay where they were. This movement can also be thought of as dispersal; the presence of an individual in the same or a neighboring cell after 1 generation time can be interpreted as the death of the mother and growth of an offspring nearby. The landscape had a periodic boundary condition to prevent the density of individuals in cells near the borders from systematically differing from those in the center of the landscape.
The landscape was subdivided into 400, 5 × 5 cell potential sampling units. Depending on the site-selection strategy, we selected 1 of these sampling units as the sampling site (Fig. 1). For the random site selection, we chose among all potential sampling units with the same probability. For abundance-biased site selection, we chose the site with the highest number of individuals summed across all species. For richness-biased site selection, we chose the site with the highest local richness. Any bias in real data collection is unlikely to be this extreme because one cannot oversee the entire landscape. Nevertheless, any patterns observed here will be qualitatively similar with less bias.
We tracked species richness over time because this is the most broadly used measure of biodiversity. Additionally, we tested the effect of site-selection bias on the total number of individuals and on an abundance-based diversity measure, and results were qualitatively similar (Supporting Information). Change in richness was estimated as the slope of a standard least-squares linear regression of species richness over time. We ran the simulations for 20 time steps. Interpreting 1 time step as 1 year, this reflects roughly the average time series typically used in biodiversity time series (Dornelas et al. 2014(Dornelas et al. , 2018. To compare the 3 sampling strategies, we visualized the slope estimates with box plots, and inferred differences when CIs around the medians did not overlap. We imposed 1 of 3 richness change regimes onto the landscape: no change, richness decrease, or richness increase. We tested change in other community metrics (increase in evenness or total number of individuals), and results were qualitatively consistent (Supporting Information). In the no change regime, any changes in richness were exclusively due to the movement of individuals. The richness increase regime converted a fixed number of individuals to new species in each time step. To generate 1 new species, 1 individual of the most common species was assigned to a new species. In each time step, 0.25% of individuals changed their identities (according to the change rate r change ), corresponding to an average of 42 new species in each time step. Species richness started at the default level (S tot = 100) and increased linearly (to on average 940 species; all added species were singletons). We implemented the richness increase via a change in identity instead of adding new individuals to uncouple the increase in richness from an increase in individuals. Therefore, the change in richness implies a change in evenness, but this change was small because only 0.25% of individuals change their species identities per time step. For the richness decrease regime, we reversed the richness increase process: starting from the end point of the richness increase regime, a fixed number of species were lost at each time step until the default richness was reached.
To assess the influence of sampling scale on the siteselection bias, we varied the size of the sampling grain from 0.1% to 1.0% of the total area. We derived the effect of bias by subtracting the estimated richness change of the biased site selection from the random site selection estimate. A high value indicated that biased site selection strongly affected estimated richness change, whereas zero indicated no influence of the bias. Additionally, we tested the impact of bias across a range of sampling durations (i.e., number of sampling time points), dispersal abilities (i.e., maximum number of cells crossed by individuals between sampling events [Supporting Information]), and total numbers of species in the landscape (i.e., regional species pool). Each parameter was varied individually, whereas the others were kept at their default values. We considered the impact of bias only for the richness-increase regime because the no-change regime had similar results (Supporting Information).
In our standard model, we assumed a random spatial distribution of individuals. To determine how aggregation affects the strength of site-selection bias, we introduced high initial spatial aggregation of individuals in a model variant (Supporting Information). For the aggregation, we used a Poisson point cluster process (Robledo-Arnuncio & Austerlitz 2006). For each species i (i = 1, … , S tot ), we randomly assigned a number of parents p i between 0.1 and 1.0 times its abundance (p i = N i /10), and the individuals were randomly assigned to a parent. Parents were randomly distributed across the landscape, and individuals were placed randomly around their parents within a radius d (d = c/20, where c is the width of the landscape). This procedure only clustered individuals, not species. The aggregation fades over time because the subsequent movement of individuals is still random.

Literature Review
Based on our literature review, citizen science data were more affected by site-selection bias than the other data sources. Volunteers' site selection is biased toward diversity, threatened species, great numbers of individuals, and hotspots (Tulloch et al. 2013;Boakes et al. 2016;Videvall et al. 2016). Common motivations for participating in citizen science programs are improving species identification skills and discovering new species (Richter et al. 2018), which could lead to a site-selection bias.
Museum data are used to address a variety of ecological questions, including biodiversity change (Pyke & Ehrlich 2010;Bartomeus et al. 2019). However, data collected for museums are not aimed toward an unbiased, representative view of species occurrences, but to build extensive

Figure 2. Site-selection bias among 4 sources of biodiversity data: (a) conceptualization of the likelihood of encountering site-selection bias in decreasing order from citizen science, to museum collections, to national parks, to academic research, (b) citizen science programs that produce unbiased data (n = 44; opportunistic, casual reporting of sightings; free, subjective site selection by the recorder; designated, a priori site selection by the program organizer), and (c) national park monitoring programs (n = 16) that produce unbiased data.
and interesting collections (i.e., collectors bias). Thus, museum data are subject to site-selection bias, and their relative abundance estimates sometimes differ strongly from ecological data (Nekola et al. 2019). Museum data typically include information on presences only and are therefore better suited for detecting losses than for detecting gains (Skelly et al. 2003;Bartomeus et al. 2019). The collection intensity tends to be higher in biodiversity hotspots (Nelson et al. 1990).
National park monitoring was usually designed with a high awareness of sampling biases (10 of 16 monitoring protocols discussed sampling biases [Supporting Information]). However, national park biodiversity data remained potentially biased because monitoring was often motivated by concerns about anthropogenic pressures on protected areas. This was exemplified by the common practice of monitoring sites established nonrandomly in the past, typically because they are either pristine or threatened (Supporting Information). Therefore, established monitoring sites could be in conflict with the need for representative sites for inference (Théau et al. 2018). Additionally, results based on data from within a national park may not be representative of the surrounding landscape.
In academic research, systematic site-selection schemes have been increasingly applied and refined over time (Michalcová et al. 2011;Swacha et al. 2017), which minimizes site-selection biases. However, site-selection bias occurred when systematic schemes were not applied. Subjectively chosen vegetation plots show higher species richness and more rare species than systematically chosen units (Diekmann et al. 2007;Michalcová et al. 2011), even when the aim is explicitly to choose representative sites (Swacha et al. 2017). Site-selection bias was especially likely in short, historic time series, such as resurveys. Resurveys typically are conducted over only a few years, take place at places with historic presences (while lacking information on absences), and mostly report declines (Skelly et al. 2003).
Site-selection bias was highest in citizen science data, followed by museum data, whereas monitoring data from national parks and academic research were less likely biased (Fig. 2a). Our quantitative assessment classified the majority of citizen science programs as potentially biased (82%, Fig. 2b); more than half of the programs (57%) reviewed applied opportunistic sampling schemes. Of the citizen science programs that allowed for free site selection by the volunteers (36%), only half required training before participation or mentioned representative sites in their instruction manuals. Among the surveyed U.S. national parks, systematic site-selection schemes were more common than among the citizen science programs (81% vs. 7%). Still, the majority of the national park monitoring data were potentially biased (56%) (Fig. 2c) because they included legacy sites (44%), subjectively  chose sites (19%), or the monitoring strategy was not designed to produce a representative sample, but to efficiently assess total diversity (6%).

Simulation Approach
Randomly selected sites always reflected the true biodiversity trend in the landscape (Fig. 3). For biased site selection, simulated richness estimates were consistently lower than in randomly chosen sites (and CIs around the median did not overlap the true trend). For example, for a true richness decrease, biased site selection exaggerated richness loss by >300% (mean of abundance-biased and richness-biased estimates −0.34 and −0.43, respectively, compared with −0.11 for randomly selected sites). Biased site selection consistently resulted in negative trends, even when richness was increasing (Fig. 3c). The increase in richness was estimated as 0.11 in randomly selected sites, but estimates from sites selected with an abundance and richness bias were −0.09 and −0.22, respectively (mean of 10,000 independent simulations).
Rates of change were strongly underestimated when sampling grains were small, even for random site selection (e.g., 0.5 compared with the true change of ±42 species/sampling interval). The larger the sampling grain, the closer the observed changes for biased site se-lection approached those derived by random sampling. For small grains (below 0.25% of total landscape area), the true local increase was lower than the apparent decrease introduced by the bias (Fig. 4c). Thus, at small spatial scales, the bias outweighed the actual richness change, leading to a sign error of the estimated change.
Other factors also influenced the strength of siteselection biases. Site-selection bias was weaker for longer sampling duration (Fig. 5b), high dispersal ability (Fig. 5c), and small regional species pool (Fig. 5d). Site-selection bias also depended on the spatial distribution of individuals and was stronger when individuals were aggregated in the landscape: For a true richness change of zero, the bias decreased the observed richness decline to −0.46 (relative to −0.20 in the standard model, [Supporting information]). Randomly selected sites showed a slight increase in richness, even when richness was not changing in the landscape (Supporting Information). This was caused by the overrepresentation of low-abundance sites in the landscape. Because most potential sampling sites had below-average abundance initially, subsequent random movement of individuals across the landscape resulted in an increase of abundance (and thus richness) on average.

Likelihood of Site-Selection Bias in Data Sources
Site-selection bias potentially occurred in each of the major sources of biodiversity data we examined. Thus, we emphasize the critical importance of being aware of, and minimizing the influence of, site-selection bias when estimating biodiversity trends through time. This is particularly true when using data from opportunistic or semistructured citizen science programs with no targeted training (e.g., eBird [Kelling et al. 2019]). The risk of site-selection bias was also high in data from museums, legacy sites, and resurveys. Even data collected by ecologists for the purpose of monitoring biodiversity cannot be regarded as free of site-selection bias (Diekmann et al. 2007;Swacha et al. 2017).
We emphasize that all these sources yield valuable data for answering scientific questions. Our classification as potentially biased is restricted to the context of this study (i.e., inference of local biodiversity trends) and is not a critique of the data as a whole. Our focus was on differences in site-selection bias among the data sources, and we ignored differences in, for example, observer error and detection efficiency. There is a growing awareness of sampling biases, both in citizen science (Videvall et al. 2016) and academic research (Michalcová et al. 2011;Swacha et al. 2017). However, the importance of site-selection bias on estimates of biodiversity trends has been largely ignored (Fournier et al. 2019 Figure 5. Impact of site-selection bias over a gradient in (a) grain size (i.e., size of sampling unit as percentage of total landscape area, based on the same data as for Fig. 4c) (i.e., size of regional species pool) (lines, median of 10,000 independent simulations; black triangle, default value).

Effects of Site-Selection Bias on Estimates of Biodiversity Change
Our simulations showed that site-selection bias can amplify estimates of biodiversity loss, even reversing the direction of a trend from positive to negative (Fig. 3, S3). Fournier et al. (2019) hypothesized that site-selection bias could be less severe in community data because declines in one population could be buffered by the independent fluctuations of other species. However, our results show that biodiversity trends could also be strongly affected by site-selection biases. There is clearly often true biodiversity loss, for example, due to land-use change, anthropogenic degradation of habitats, and climate change (Murphy & Romanuk 2014;Newbold et al. 2015). However, declines could be overestimated if siteselection biases exist, which could even mask increases. The risk of such sign errors is especially high when biodiversity change is slow (Supporting Information). If siteselection bias is present, more declines are detected than there truly are, and negative impacts are inferred where there are none (Mapstone 1995). There is also a reverse site-selection bias (Palmer 1993;Fournier et al. 2019) that can arise when restoration efforts are monitored at sites with initially below-average abundance or species richness. In this case, overly positive estimates of species re-covery can arise due to the same regression-to-the-mean effect. A misinterpretation of data due to any kind of siteselection bias can have serious implications for conservation informed by these sources.
It is important to distinguish the site-selection bias we focused on here from habitat-selection bias. Siteselection bias describes the preference of rich or densely populated sites within a region of otherwise relatively homogeneous environmental conditions, where differences in richness and abundances are random (e.g., neutral dynamics [Hubbell 2001]). In contrast, habitatselection bias occurs when choosing between different habitats, for example the well-known preference of highquality over degraded habitats (Boakes et al. 2010;Martin et al. 2012), and can be accounted for by stratification approaches to sampling (van Swaay et al. 2002). A potential limitation of our simple simulation is that we intentionally assumed a homogeneous landscape (i.e., no differences in habitat quality) because our aim was to demonstrate the potential influence of site-selection bias on biodiversity trends, rather than the influence of habitat bias. Still, to test the robustness of our conclusions, we constructed an alternative model setup that included habitat heterogeneity and found that our qualitative results remained (Supporting Information).

Strength of Site-Selection Bias
The strength of site-selection biases was driven mainly by 3 interrelated factors: scale, dispersal ability, and spatial heterogeneity. First, site-selection biases were greatest at small spatial and temporal grains. This is in line with other studies that show site-selection bias decreases as plot size increases (Diekmann et al. 2007) and time series become longer (Fournier et al. 2019). Second, strong site-selection bias can be expected for organisms with low dispersal ability. If offspring disperse slowly throughout the surrounding landscape, the exceptionally high abundance or richness in the selected site takes longer to fade. Third, site-selection biases were stronger when species were distributed more heterogeneously within a given habitat type because this determined how different selected sites were from the rest of the landscape (i.e., how strong the initial bias was). Such heterogeneity was greatest when there were many species, meaning that richness biases were stronger for large regional species pools. Spatial heterogeneity also increased when individuals were aggregated in the landscape. Aggregation of individuals made gradients in total abundance and richness between potential sites steeper (Supporting Information), increasing the likelihood to observe a strong decline and amplifying the impact of site-selection bias. Thus, studies of assemblages with high degrees of clustering in space (such as snails and mussels, often occurring in museum collections) could be at larger risk for a strong impact of site-selection bias on biodiversity trends.

Recommendations to Reduce Site-Selection Bias
Systematic site-selection schemes minimize site-selection bias. Of course, this is not new, and has been suggested many times (e.g., Olsen et al. 1999;Videvall et al. 2016). Nevertheless, systematic sampling schemes are often not applied because they may lead to access problems. This can be circumvented by using stratification approaches that exclude non-accessible sites (Danz et al. 2005). In citizen science, the use of systematic sampling schemes may be less attractive to participants because more time is spent where fewer organisms are (Nekola et al. 2019). Several sampling schemes have been suggested to account for sampling bias while keeping citizen scientists motivated, but not for the influence site-selection bias on which we focused here.
When deciding on new monitoring programs, it is necessary to decide whether to allocate resources toward greater spatial coverage or to maximize temporal resolution, for example, in the context of citizen science (Rhodes & Jonzén 2011). We recommend that sampling many plots should be preferred over sampling frequently to minimize potential site-selection biases on biodiversity trends. This recommendation is based on our finding that the sampling grain size was a crucial parameter determining the impact of site-selection bias. Although increasing temporal resolution cannot mitigate site-selection bias, increasing sampling duration is another effective method to minimize this kind of bias.
Previous studies have called for analytical methods to correct for the site-selection bias retrospectively (e.g., by spatial weighting of samples (van Swaay et al. 2002;Fournier et al. 2019). Other types of sampling biases can be eliminated by weighting samples to reflect the true spatial distribution of the variable affected by the bias. For example, habitat-selection bias can be corrected by spatially weighting samples to reflect the overall true distribution of habitat types (van Swaay et al. 2002). Similarly, roadside bias (i.e., the preferential sampling at easily accessible sites) can be rectified based on knowledge on the distribution of roads (Kadmon et al. 2004). In contrast, the site-selection bias can only be eliminated if the true spatial distribution of local biodiversity in the landscape is known.
Removing the first time points of a time series (i.e., left censoring) has been recommended as an effective method to reduce site-selection biases in population data (Fournier et al. 2019) because the impact of the siteselection bias decreases as time since initial establishment of the site increases. Such an approach can also be useful for eliminating the influence of site-selection bias in biodiversity data (removing 20% of the data points led to an ∼50% reduction of the impact of site-selection bias [Supporting Information]). Left censoring has also been used as a tentative test for a potential site-selection bias in existing data (Fournier et al. 2019). However, this procedure is not a rigorous test because a difference in change estimate in the initial time points compared with the entire series can result from site-selection bias as well as a true biodiversity change.
For data analyses, we suggest that site-selection bias can be reduced in a few ways. Instead of averaging trends across multiple plots, we recommend that biodiversity measures should be calculated cumulatively across all plots. This ensures that the largest available fraction of landscape is sampled. In addition, measures that include the relative abundances of species and that are less sensitive to rare species, such as the effective number of species may be more useful for identifying change (Supporting Information). This is especially useful when analyzing data from legacy sites, which yield valuable longterm data, but have not been established according to rigorous site-selection criteria.
Foundation; DFG FZT 118). H.H. and D.H. were supported by HIFMB, a collaboration between the Alfred-Wegener-Institute, Helmholtz-Center for Polar and Marine Research, and the Carl-von-Ossietzky University Oldenburg, initially funded by the Ministry for Science and Culture of Lower Saxony and the Volkswagen Foundation through the Niedersächsisches Vorab grant program (grant no. ZN3285). The scientific results were, in part, computed at the High-Performance Computing (HPC) Cluster EVE, a joint effort of both the Helmholtz Centre for Environmental Research -UFZ and the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. We thank the administration and support staff of EVE.
Open access funding enabled and organized by Projekt DEAL.

Supporting Information
Lists of the reviewed instruction manuals (Appendix S1) and a description of model sensitivity and alternative model variants (Appendix S2) are available online. The authors are solely responsible for the content and functionality of these materials. Queries (other than absence of the material) should be directed to the corresponding author. Computer code is available from doi.org/10.5281/zenodo.3948054. Supplementary Material