Volume 35, Issue 1 p. 249-262
Contributed Paper
Open Access

The challenge of biased evidence in conservation

Alec P. Christie

Corresponding Author

Alec P. Christie

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

Address correspondence to A. P. Christie, email [email protected]

Search for more papers by this author
Tatsuya Amano

Tatsuya Amano

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

School of Biological Sciences, University of Queensland, Brisbane, Queensland, 4072 Australia

Centre for Biodiversity and Conservation Science, University of Queensland, Brisbane, Queensland, 4072 Australia

Search for more papers by this author
Philip A. Martin

Philip A. Martin

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

BioRISC (Biosecurity Research Initiative at St Catharine's), St. Catharine's College, Cambridge, CB2 1RL U.K.

Search for more papers by this author
Silviu O. Petrovan

Silviu O. Petrovan

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

Search for more papers by this author
Gorm E. Shackelford

Gorm E. Shackelford

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

BioRISC (Biosecurity Research Initiative at St Catharine's), St. Catharine's College, Cambridge, CB2 1RL U.K.

Search for more papers by this author
Benno I. Simmons

Benno I. Simmons

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

Department of Animal and Plant Sciences, University of Sheffield, Sheffield, S10 2TN U.K.

Centre for Ecology and Conservation, College of Life and Environmental Sciences, University of Exeter, Penryn Campus, Penryn, TR10 9FE U.K.

Search for more papers by this author
Rebecca K. Smith

Rebecca K. Smith

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

Search for more papers by this author
David R. Williams

David R. Williams

Sustainability Research Institute, School of Earth and Environment, University of Leeds, Leeds, LS2 9JT U.K.

Search for more papers by this author
Claire F. R. Wordley

Claire F. R. Wordley

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

Search for more papers by this author
William J. Sutherland

William J. Sutherland

Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, CB3 3QZ U.K.

BioRISC (Biosecurity Research Initiative at St Catharine's), St. Catharine's College, Cambridge, CB2 1RL U.K.

Search for more papers by this author
First published: 24 June 2020
Citations: 78

Article impact statement: Severe taxonomic and geographic biases threaten evidence-based conservation efforts.

Abstract

en

Efforts to tackle the current biodiversity crisis need to be as efficient and effective as possible given chronic underfunding. To inform decision-makers of the most effective conservation actions, it is important to identify biases and gaps in the conservation literature to prioritize future evidence generation. We used the Conservation Evidence database to assess the state of the global literature that tests conservation actions for amphibians and birds. For the studies in the database, we investigated their spatial and taxonomic extent and distribution across biomes, effectiveness metrics, and study designs. Studies were heavily concentrated in Western Europe and North America for birds and particularly for amphibians, and temperate forest and grassland biomes were highly represented relative to their percentage of land coverage. Studies that used the most reliable study designs—before-after control-impact and randomized controlled trials—were the most geographically restricted and scarce in the evidence base. There were negative spatial relationships between the numbers of studies and the numbers of threatened and data-deficient species worldwide. Taxonomic biases and gaps were apparent for amphibians and birds—some entire orders were absent from the evidence base—whereas others were poorly represented relative to the proportion of threatened species they contained. Metrics used to evaluate effectiveness of conservation actions were often inconsistent between studies, potentially making them less directly comparable and evidence synthesis more difficult. Testing conservation actions on threatened species outside Western Europe, North America, and Australasia should be prioritized. Standardizing metrics and improving the rigor of study designs used to test conservation actions would also improve the quality of the evidence base for synthesis and decision-making.

Abstract

es

El Desafío de la Evidencia Sesgada en la Conservación

Resumen

Los esfuerzos para lidiar con la actual crisis de la biodiversidad necesitan ser tan eficientes y efectivos como sea posible dado el crónico subfinanciamiento. Para informar a los órganos de decisión sobre las acciones de conservación más efectivas, es importante identificar los sesgos y las brechas en la literatura de la conservación para priorizar generación de evidencias en el futuro. Usamos la base de datos Conservation Evidence para evaluar el estado de la literatura mundial que analiza las acciones para la conservación de anfibios y aves. Para los estudios dentro de la base de datos, investigamos su extensión espacial y taxonómica y su distribución a lo largo de biomas, medidas de efectividad y diseños de estudio. Los estudios se concentraron principalmente en Europa Occidental y en América del Norte en el caso de las aves y particularmente para los anfibios. Los biomas con mayor representación en relación con su porcentaje de cobertura de suelo fueron el bosque templado y los pastizales. Los estudios que utilizaron el diseño más confiable - impacto del control antes- después y ensayos controlados al azar - fueron los que presentaron mayor restricción geográfica y menor presencia dentro de la base de evidencias. También encontramos relaciones espaciales negativas entre el número de estudios y el número de especies amenazadas o con pocos datos a nivel mundial. Los sesgos y las brechas taxonómicas fueron evidentes para los anfibios y las aves - hubo órdenes enteros ausentes en la base de evidencias - mientras que otros taxones estuvieron representados pobremente en relación con la proporción de especies amenazadas que albergan. Las medidas utilizadas para evaluar la efectividad de las acciones de conservación con frecuencia fueron incompatibles entre los estudios, lo que las hace potencialmente menos comparables directamente y también dificulta la síntesis de las evidencias. Se debe priorizar el análisis de las acciones para la conservación de las especies que se encuentran fuera de Europa Occidental, América del Norte y Australasia. La estandarización de las medidas y el mejoramiento del rigor de los diseños de estudio que se usan para evaluar las acciones de conservación también mejoraría la calidad de la base de evidencias para la síntesis y la toma de decisiones.

摘要

zh

鉴于生物多样性保护长期面临着资金不足问题, 应对当下生物多样性危机的努力必须尽可能高效和有用。为了让决策者了解最有效的保护行动, 分辨保护文献中的偏倚和空缺来优先进行未来收集保护证据的工作十分重要。我们用保护证据数据库评估了全球两栖动物和鸟类保护行动的研究文献的情况, 并分析了数据库中研究的生物群系在空间和分类学上的范围和分布、有效性指标和研究设计。鸟类的研究主要集中在西欧和北美, 两栖动物也尤其如此, 另外, 对温带森林和草原的生物群系的研究相对于它们在陆地上的覆盖率有很高的代表性。采用最可靠的前后对照影响分析和随机对照试验设计的研究最容易受到地域限制且证据基础最少。全世界范围内, 研究数量与受威胁和缺乏数据的物种数量在空间上存在负相关关系。两栖动物和鸟类明显存在分类学偏倚和研究空缺, 有些目完全没有证据基础, 还有一些目相对于它们所包含的濒危物种的比例来说没有得到足够代表。此外, 用于评估保护行动有效性的指标在研究之间常常不一致, 可能导致研究不能直接比较, 而且证据的综合更加困难。我们提出, 应优先对西欧、北美和澳大拉西亚以外的濒危物种的保护行动进行调查。指标标准化、提高用于检验保护行动有效性研究设计的严谨性, 也将提高用于综合分析和决策的证据基础的质量。翻译: 胡怡思; 审校: 聂永刚

Introduction

The insufficient funding of biodiversity conservation (Dirzo et al. 2014) means researchers and funders must prioritize effort to maximize its potential to inform conservation. While evidence-based conservation is likely to lead to more efficient outcomes, this approach requires a reliable evidence base. Summaries of evidence relating to the effectiveness of different conservation interventions (Sutherland et al. 2004) have produced a substantial evidence base (Sutherland et al. 2019), yet little is known about the biases, gaps, and clusters of this evidence. Knowing the current state of the evidence base for conservation is crucial to prioritizing future research efforts (Aranda et al. 2011). We focused on studies that tested conservation interventions (e.g., creating ponds or restoring grasslands).

The lack of resources in conservation research is likely to lead to several forms of bias in the evidence base. Such biases may limit the ability to provide relevant evidence-based recommendations to decision-makers or make the process of evidence synthesis more challenging. For example, geographical and taxonomic biases toward regions or groups may lead to little locally relevant evidence (Christie et al. 2020). Alternatively, bias could be useful if research effort is prioritized to where it is needed most (e.g., if most studies focused on threatened species). Wealthier countries perform the majority of conservation research, so one may expect patterns of evidence to reflect physical proximity to these countries (Reddy & Dávalos 2003) and socioeconomic variables (e.g., gross domestic product per capita, affluence, language, security, conflict, and infrastructure) (Martin et al. 2012; Amano & Sutherland 2013; Meyer et al. 2016; Hickisch et al. 2019). These factors are likely to cause publication bias (i.e., underrepresentation of studies from non-English speaking countries [Amano et al. 2016; Nuñez et al. 2019]) and affect the representation of habitats in the evidence base (Fazey et al. 2005). Publication bias also varies with taxonomic group (Clark & May 2002; Murray et al. 2015; Donaldson et al. 2016) and is affected by range size, diet, and body size of species (Brooke et al. 2014), favoring relatively large, more detectable species (e.g. Brodie 2009; Cardoso et al. 2011). These forms of bias affect the external validity of studies in the evidence base and are therefore important to help one understand the locations and taxa for which little or no evidence exists.

Other forms of bias may also complicate the synthesis of evidence. Differences in the quality of study designs may make it difficult to decide which studies to trust, particularly if results are conflicting. The different study designs used to assess impacts of threats and conservation interventions (De Palma et al. 2018; Christie et al. 2019) are all affected by differing sources and levels of bias and noise. Designs range from the relatively reliable (e.g., experimental randomized controlled trials [RCTs] and quasi-experimental before-after control-impact designs [BACI]) to the less reliable (e.g., control-impact [CI], before-after [BA], and after) (Table 1). Evidence may also come in the form of systematic reviews and meta-analyses, generally considered reliable depending on the methods used and reliability of included studies. Typically, the conservation literature is thought to have relatively few studies with reliable study designs due to logistical, funding, and time constraints (De Palma et al. 2018; Christie et al. 2019). How this broad pattern varies geographically (i.e., Are reliable study designs used more often in certain regions?) and the prevalence in the literature of studies using these designs to test conservation interventions are unknown, except in the tropics for evidence on the effectiveness of tropical forest conservation (Burivalova et al. 2019). Insufficient reliable evidence in certain regions would mandate greater efforts to improve the types of study design implemented in those locations.

Table 1. Definitions for each study design examined based on criteria used to define them and the keywords used in the Conservation Evidence database (Sutherland et al. 2019)
Design* Design acronym Control? Sampling before intervention? Randomized? Matching or pairing? Experimental? Example
After No no no no Nonexperimental (no comparison)

Monitoring the number of songbirds feeding in field margins after

sowing wildflower seeds

Before-after BA No yes no no Nonexperimental (no comparison)

Quantifying amphibian mortality on roads before and after creating road tunnels

Control-impact CI yes no no yes or no quasi-experimental

investigating invertebrate diversity in grazed and ungrazed grassland plots

Before-after control-impact BACI yes yes no yes or no Quasi-experimental

Comparison of biodiversity before and after the addition of dead wood in streams with an upstream control and downstream treatment

Random-ized controlled trial RCT yes yes or no yes - Experimental

Monitoring effect of crop rotation between randomly assigned treatment and control fields

  • * Experimental designs use randomized allocation of independent experimental units to treatment and control groups (RCT). Definitions: after and before-after, nonexperimental designs lacking a control group; control-impact and before-after control-impact, quasi-experimental designs not randomized but with a control group; randomized controlled trial, allocation of independent experimental units to treatment and control groups.

Variation in the use of different metrics to assess the effectiveness of a conservation intervention may also make approaches, such as meta-analyses, difficult to use. Results are less directly comparable when different metrics are used to assess effectiveness, which reduces the number of studies that can be combined in a meta-analysis. For example, it would be difficult to combine a set of studies measuring reproductive success, reductions in adult mortality, numbers of individuals, and species richness of birds using nest boxes in a conventional meta-analysis on the effectiveness of nest boxes. Different metrics may be useful to assess different aspects of an intervention's effectiveness and to give greater confidence about the overall effectiveness of an intervention. However, wide variation in metrics used to test the same intervention could cause confusion for decision-makers, especially when different metrics yield different results (Capmourteres & Anand 2016).

We sought to improve empirical and quantitative understanding of the biases and gaps in the evidence base for conservation. We analyzed the Conservation Evidence database (Sutherland et al. 2019), a comprehensive collection of 5,816 publications (as of March 2020) that have quantitatively tested the effectiveness of conservation interventions, for evidence of bias. We set out to answer the following questions for amphibians and birds: what is the geographic distribution of studies; how does this distribution vary for studies with different designs; what is the taxonomic distribution of studies; and, for studies on a given conservation intervention, how much variation is there in study design and metrics? Identifying patterns, biases, and knowledge gaps in the evidence base can help set priorities for future research. With a more reliable and complete evidence base, research can better support evidence-based decision-making in conservation and ultimately more effective conservation.

Methods

Conservation Evidence Database

The Conservation Evidence project summarizes studies that have quantitatively tested the effect of a conservation intervention (Sutherland et al. 2019). Conservation interventions are defined as “actions that have been or could be used to conserve biodiversity,” and the effect that is quantified can be “on any aspect of biodiversity (e.g., abundance of a focal species, survival rates of translocated individuals, use of nest boxes, extent of habitat) or human behavior related to biodiversity conservation (e.g., levels of hunting or sales of products detrimental to biodiversity)” (Sutherland et al. 2019:3). These studies are found using systematic manual searches of the conservation literature including over 290 English and 150 non-English language journals (Sutherland et al. 2019). The Conservation Evidence website, as of March 2020, is organized into 2,105 different interventions (e.g., control invasive mammals on islands) contained within 16 synopses (e.g., bird conservation) and displays a summary of each study included or multiple summaries if a study's results apply to several interventions (e.g., both pond creation and translocation of amphibians). A list of interventions is created for each synopsis by consulting initial literature scans (but before systematic manual searches) and an advisory board (a range of academics, practitioners, and policymakers with subject-specific expertise from different parts of the world) (Sutherland et al. 2019). Interventions are usually described at a fine scale (e.g., set longlines at the side of the boat to reduce seabird bycatch is a separate intervention from set lines underwater to reduce seabird bycatch).

To assess the number of studies per intervention for certain subsets of studies (e.g., by the metric or study design used), we grouped similar interventions that focused on single taxa or habitats (e.g., create ponds for frogs and create ponds for toads would be grouped into create ponds [see Supporting Information]). This ensured that the scope of interventions was appropriate for our analysis and did not act as a constraint on the numbers of studies per intervention.

We extracted metadata from the database for every study within the amphibian (n = 410) (Smith & Sutherland 2014) and bird synopses (n = 1,239) (Williams et al. 2012) including the latitude and longitude coordinates (mean coordinates where a study had multiple sites). We considered only studies for amphibians and birds because these taxa had the most complete and comprehensive metadata in the database. The literature searches that retrieved these studies (Sutherland et al. 2019) were last conducted in 2012 for amphibians and 2011 for birds. While these searches are not as recent as we might wish, these data provide the only way to reasonably assess biases in a large number of studies that have tested the effectiveness of conservation interventions. For all analyses, we excluded interventions that were not tested by any studies: 31 interventions for amphibians and 56 for birds.

Patterns in Evidence for Different Metrics and Designs

A standardized set of keywords are used to describe study design in the Conservation Evidence database (Table 1). A single report or paper summarized in the database may report use of multiple study designs when several tests are described. Each study design described in a report or paper constitutes an individual study, each of which we counted separately. An individual study can also be assigned to multiple interventions and multiple synopses if it contains relevant information. We used the number of studies per intervention as the major variable of interest. To determine the accuracy of reported study designs, we manually checked the original papers of a random 5% of studies in the database (n = 21 for amphibians; n = 62 for birds). The correct design was reported for 95% of amphibian studies (1 study with an after design was misreported as a BA design [Supporting Information]) and 94% of bird studies (1 CI study misreported as after, one BACI study misreported as CI, 2 RCT studies misreported as CI [Supporting Information]). Because we estimated the mean number of studies per intervention that used different study designs across many interventions and the global geographical distribution of many studies with different designs (see next section), these misclassifications would have little effect on our overall results.

To identify the metrics used in each study to measure the effectiveness of interventions, we first used web scraping to obtain summaries of studies from the Conservation Evidence website. To do so, we used the XML package (Lang & CRAN team 2018a) and RCurl package (Lang & CRAN team 2018b) in R statistical software version 3.5.1 (R Core Team 2018). We also used the doParallel package (Microsoft Corporation & Weston 2017) to increase computational performance. Once summaries were obtained, we created and tested a set of regular expression rules (e.g., matching keywords and patterns) to detect the following metric groups used by each study: abundance, density, and cover; mortality and survival; diversity and species richness; and reproductive success (Supporting Information). This was necessary because this information is currently not in the database, and it allowed us to quantify the number of studies in which each metric was used and the number of unique metrics used in each intervention.

For a random 5% of studies (n = 21 amphibians, n = 62 birds), regular expressions correctly identified the metric groups in 90% of amphibian studies and 95% of bird studies (Supporting Information). For amphibians, all misclassifications were false negatives (failure to detect abundance, density, and cover in 2 studies). For birds, there were false positives for 2 studies (3.2%, 1 erroneous detection of reproductive success and 1 of mortality and survival) and a false negative for 1 study (1.6%, failure to detect diversity and species richness). Because we were using this automated classification to gain an overall estimate of the mean number of studies per intervention across a large number of interventions for each metric group, these misclassifications would have little effect on overall estimates. Automating the extraction of effectiveness metrics also offered the most feasible and reproducible way to analyze the entire evidence base and controlled for some potential biases that would affect manual classification.

Patterns in Evidence Spatially and Taxonomically

We mapped the spatial distribution of studies in the database by creating a raster layer with the raster package (Hijmans 2019). We summed the number of studies in which different study designs were used for each 4 × 4 degree cell from studies’ longitude and latitude coordinates. We chose this resolution to aid data visualization for the maps we produced (Figs. 1 & 2). We excluded reviews from our analyses because they were often global or regional in scale. To estimate the geographical coverage of studies, we counted the number of countries and continents they were present in. We also compared the number of studies in each 2 × 2 degree cell with the number of species, threatened species, and data-deficient species for extant amphibian and bird species with data downloaded from the International Union for Conservation of Nature (IUCN) Red List (IUCN 2019). We chose a 2 × 2 degree grid cell resolution because this was the maximum appropriate resolution recommended by Hurlbert & Jetz (2007) for range-map data. We excluded grid cells containing zero studies and zero species and normalized the number of studies and species to between 0 and1: urn:x-wiley:08888892:media:cobi13577:cobi13577-math-0001. We then quantified the relationship between the normalized number of studies (as the response variable) and species (as the explanatory variable) in each grid cell with a generalized linear model with a binomial error distribution and log-link function. We repeated this normalization and modeling separately for the number of threatened species and the number of data-deficient species. A square-root transformation of the explanatory variable (number of species, threatened species, or data-deficient species) did not substantially improve model fit (Akaike's information criterion [AIC] values were not reduced by more than 2 units and R2 values remained unchanged or marginally increased) (Supporting Information). We therefore chose untransformed models because these were more parsimonious. All modeling assumptions held in terms of no overdispersion, and there were no substantial patterns between residuals and the explanatory variable or fitted values.

Details are in the caption following the image
Spatial distribution of studies on amphibian conservation based on a Robinson projection and grid cells at a 4 × 4 degree resolution (BA, before-after; CI, control-impact; BACI, before-after control-impact; RCT, randomized controlled trial) (see Table 1 for details of designs).
Details are in the caption following the image
Spatial distribution of studies on bird conservation based on a Robinson projection and grid cells at a 4 × 4 degree resolution (BA, before-after; CI, control-impact; BACI, before-after control-impact; RCT, randomized controlled trial) (see Table 1 for details of designs).

We assessed the relative under- or overrepresentation of different biomes in the database by calculating the difference between the percentage of studies conducted in each biome and the percentage of Earth's terrestrial area covered by each biome (Dinerstein et al. 2017). We assigned studies to each biome based on longitude and latitude coordinates for each study, a shapefile of 14 terrestrial biomes (see Dinerstein et al. 2017), and the sp package in R (Pebesma & Bivand 2005; Bivand et al. 2013). We excluded studies conducted outside terrestrial biomes (e.g., studies considering seabirds over oceans).

To investigate the distribution of evidence taxonomically, we calculated the percentage of studies that tested an intervention on each of the major bird orders based on a cladogram from Prum et al. (2015). For amphibians we did the same for the 3 major amphibian orders based on a trimmed cladogram from Pyron & Wiens (2011). To investigate the representation of taxonomic orders in the evidence base, we calculated the difference between the proportion of studies and the proportion of threatened species in each order (relative to the number of all threatened amphibian or bird species [Fig. 4]) and the proportion of amphibian and bird species in each order (relative to the number of all amphibian or bird species [Supporting Information]). We obtained data on the number of species and threatened species (vulnerable, endangered, or critically endangered status) in each order from the IUCN Red List (IUCN 2019).

Results

There was substantial bias in the spatial distribution of evidence on conservation interventions. Approximately, 90% of amphibian studies and 84% of bird studies were conducted in North America, Europe, or Australasia. Sixty-four percent of amphibian studies and 63% of bird studies were conducted in 3 countries: the United Kingdom, United States, and Australia. There were large spatial gaps in evidence in South America, Africa, Asia, and Russia for both amphibians and birds. There were also few studies in the tropics or close to the poles (Figs. 1 & 2).

The geographical distribution of studies varied considerably by study design. Amphibian studies with the most reliable study designs, BACI and RCT, were concentrated in North America and Europe; these designs were almost absent from the tropics (Fig. 1). No BACI or RCT studies for amphibians were conducted in South America or Africa, (as well as Asia for RCT studies and Australasia for BACI studies), and both types of study design were used in 10 countries or fewer (Fig. 1 & Supporting Information). BA studies for amphibians were conducted in 23 countries (none from South America), whereas CI studies were conducted in fewer countries (18) but were present in all continents where amphibians exist. Amphibian studies with after designs covered the greatest number of countries (31) across all possible continents (Supporting Information).

The evidence for birds had a greater geographical coverage than for amphibians, particularly in the tropics (Fig. 2). RCT and BACI studies were largely absent from most of South America, Africa, and Asia and present in considerably fewer countries than after, CI, and BA studies (Supporting Information).

There was no statistically significant spatial relationship (p > 0.05) between the number of studies and the number of amphibian species, and the positive spatial relationship with the number of bird species was marginal (< 0.05) (Fig. 3 & Supporting Information). Conversely, the number of studies significantly decreased as the number of threatened species (birds, < 0.01; amphibians, p < 0.05) and data-deficient species decreased (birds, < 0.05; amphibians, < 0.05); however, the magnitude of this decrease was small for birds (Fig. 3 & Supporting Information). For amphibians, the grid cell with the most studies (normalized value of 1 [Fig. 3]) covered central England, whereas for birds, the 2 grid cells with the most studies covered central and northern England (normalized values of 0.95 and 1, respectively [Fig. 3]).

Details are in the caption following the image
Comparison of the normalized number of studies and the normalized number of species (all species, threatened species, and data-deficient species) in 2 × 2 degree grid cells for amphibians and birds (1, cells with the most studies or species; 0, cells with the fewest studies or species; lines, fitted based on binomial generalized linear models for which statistically significant increases or decreases were detected, < 0.05 [details in Methods]; point size, proportional to the number of points at that position). Cells with zero studies and zero species excluded. Slopes of the regression lines are negative for threatened and data-deficient amphibian and bird species. Threatened species are those classified as vulnerable, endangered, or critically endangered by the International Union for Conservation of Nature (IUCN 2019).

There was also substantial variation in the representation of different amphibian and bird orders in the evidence base relative to the proportion of threatened species each order contained. For birds the most well-represented orders were, in rank order, shorebirds (Charadriiformes), waterfowl (Anseriformes), and falcons (Falconiformes) (i.e., high proportions of studies relative to proportions of threatened species) (Fig. 4). Songbirds (Passeriformes); parrots (Psittaciformes); pigeons (Columbiformes); and nightjars, hummingbirds, and swifts (Caprimulgiformes) were the least well-represented bird orders (i.e., low proportions of studies relative to threatened species). No studies were present for several bird orders such as hornbills and hoopoes (Bucerotiformes) (see names in red in Fig. 4). For amphibians, frogs (Anura) were the least well represented, whereas salamanders (Caudata) were the most well represented. There was only a single study for the entire order of Caecilians (Gymnophiona) (Fig. 4). Patterns were different when considering the proportion of studies relative to the proportion of species in each bird order. Most bird orders were relatively well represented apart from songbirds and orders for which there were no studies (Supporting Information). For amphibians patterns in representation were similar for both the proportion of species and proportion of threatened species (Supporting Information).

Details are in the caption following the image
Percentage of studies minus percentage of threatened species in each order of amphibians and birds (percentages relative to the total number of amphibian or bird studies and species) (red, 0 studies for that order; black crosses, order contains 0 threatened species; dark blue, high proportions of studies relative to the proportion of threatened species; dark red, relatively lower proportions of studies).

Certain biomes were better represented (in terms of the total number of studies conducted in each biome) relative to the percentage of Earth's terrestrial area they covered—notably Temperate Broadleaf and Mixed Forests, Temperate Grasslands, Savannas, and Shrublands, Temperate Conifer Forests and Mediterranean Forests, and Woodlands and Scrub for both amphibians and birds (Fig. 5). The 3 most underrepresented biomes for both amphibians and birds were Deserts and Xeric Shrublands, Tropical and Subtropical Grasslands, Savannas and Shrublands, and Tropical and Subtropical Moist Broadleaf Forests (Fig. 5). For amphibians, there were no studies in Tropical and Subtropical Coniferous Forests, Tropical and Subtropical Dry Broadleaf Forests, and Tundra (red outlined circles in Fig. 5).

Details are in the caption following the image
Percentage of amphibian and bird studies conducted in each biome minus the percentage of Earth's terrestrial area covered by each biome (red outline to circle, no studies were conducted in that biome).

The total number of interventions (containing at least 1 study) was 243 for birds and 74 for amphibians. On average, there were more studies per intervention for amphibians than for birds (although the total number of studies was greater for birds than amphibians). There was a higher proportion of interventions for birds that contained 1 study (34%) than amphibians (24%) (i.e., a more right-skewed distribution of studies per intervention for birds than amphibians) (Supporting Information).

The most commonly used metrics in amphibian conservation were mortality and survival (3.9 studies per intervention) and reproductive success (3.8 studies per intervention), whereas for birds mortality and survival (3.9 studies per intervention) and abundance, density, and cover (3.8 studies per intervention) (Supporting Information) were the most common. On average, the effectiveness of each intervention was measured using 2.1 different metrics for amphibians and 3.3 metrics for birds.

There were a low number of studies per intervention that used reliable BACI or RCT designs (fewer than 0.3 studies per intervention for both amphibians and birds [Supporting Information]). Studies most commonly used the least reliable after design, followed by CI and BA designs, for both amphibians and birds. The number of studies per intervention declined when studies with certain designs were excluded (Supporting Information).

Discussion

We found that the evidence base for amphibian and bird conservation is severely biased geographically and taxonomically. Such biases may hamper the ability to make locally relevant evidence-based recommendations to decision-makers. Geographically, studies were concentrated in North America, Europe, and Australasia, and there were negative spatial relationships between the number of studies and the number of threatened species and data-deficient species for both taxa. That the most well-represented biomes in the evidence base were Temperate Broadleaf and Mixed Forests, Temperate Grasslands, and Savannas and Shrublands also indicated strong geographic bias. Taxonomically, certain orders were better studied relative to the number of threatened species they contained (e.g. salamanders for amphibians and shorebirds, falcons and waterfowl for birds), whereas some orders were not studied at all (e.g., hornbills and hoopoes).

These results show even more severe geographic biases than other studies of the wider conservation literature. The clear paucity of evidence from the polar regions (expected for amphibians but concerning for birds), Africa, Russia, the Middle East, and South America appear more severe than Wilson et al. (2016), Di Marco et al. (2017), and Hickisch et al. (2019) found. The United Kingdom rivalled the United States as a hotspot of evidence for these 2 taxonomic groups, which was not as apparent in Wilson et al. (2016) or Hickisch et al. (2019), but was in Di Marco et al. (2017). This hotspot contrasts, particularly for amphibians, with their low species richness in the United Kingdom (only 7 native amphibian species). In their review of the effectiveness of terrestrial protected areas, Geldmann et al. (2013) found different geographic biases, away from North America and Europe toward Latin America, Africa, and Asia. We believe this difference is because we considered a different subset of studies, focusing only on studies that had quantitatively tested a variety of conservation interventions, as opposed to the effectiveness of terrestrial protected areas.

That the number of studies testing conservation interventions had a negative relationship with the number of threatened species and data-deficient species is concerning. This pattern has not been found previously in studies of the wider conservation literature, which instead report positive relationships with the number of threatened species in the tropics (Reboredo Segovia et al. 2020). Such patterns clearly suggest that greater research effort needs to be targeted at testing conservation actions in regions with large numbers of threatened species that urgently require effective conservation (Junker et al. 2020; Christie et al. 2020).

However, we acknowledge that some of the geographic bias we found could be attributable to the low number of studies from non-English language journals that are currently included in the Conservation Evidence database. Publications from over 317 journals published in 10 languages are being added to the database through the Transcending Language Barriers to Environmental Sciences project (TRANSLATE). However, language bias is a common problem affecting most scientific evidence syntheses (Neimann Rasmussen & Montgomery 2018) that is often ignored. As researchers conducting evidence synthesis, we must do more to seek out and collate evidence published in non-English languages and the gray literature. This is particularly important given that approximately 36% of the wider conservation literature is found in non-English language journals (Amano et al. 2016). However, where non-English literature was included in Conservation Evidence searches (e.g., relevant ecology and conservation journals in Portuguese and Spanish for the Bat Conservation synopsis), the percentage of studies testing conservation actions was very small (0.4%, 6 studies out of 1,492 studies systematically searched) (Berthinussen et al. 2019). More generally, for all non-English journals searched to date for Conservation Evidence (across all synopses), the verified rate of studies testing conservation actions is smaller at 0.18% or 643 studies out of 3,45,119 (unpublished data). This suggests that few studies testing conservation actions would be added from the non-English literature—possibly because a substantial proportion of non-English studies may describe conservation threats and ecology, rather than describing quantitative tests of conservation actions. Therefore, language bias is unlikely to have substantially affected the broad patterns in our results. However, non-English studies that test conservation actions are potentially the only available studies for certain species and geographical areas (Berthinussen et al. 2020), so it is still very important to synthesize these studies to inform future conservation efforts.

Some taxonomic orders were well represented in the evidence on conservation effectiveness relative to the percentage of threatened species in each order, whereas other orders were very poorly represented (Fig. 4)—as found in analyses of the wider conservation literature (Clark & May 2002; Fazey et al. 2005; Murray et al. 2015; Donaldson et al. 2016). Most bird species and thus most threatened bird species were songbirds (Passeriformes, 46% of all threatened bird species), but this order was the worst represented (31% of studies), followed by parrots (Psittaciformes; 8% of all threatened bird species but only 2% of studies). Conversely, shorebirds (Charadriiformes) and waterfowl (Anseriformes) were the best represented (3% and 2% of all threatened bird species and 13% and 8% of studies, respectively). These differences in representation probably reflect the relative difficulty in studying threatened songbird species (e.g., small-bodied, forest species with small range sizes) and parrots (often found in less easily accessible tropical locations) relative to shorebirds and waterfowl (with generally larger range sizes that often overlap with hotspots of research effort in North America and Europe).

Among amphibians, salamanders were well represented because this group has only 14% of all threatened amphibian species, but appeared in 30% of studies. This is potentially because certain nonthreatened but protected species, such as Great Crested Newts (Triturus cristatus) (a European protected species with an IUCN [2019] Red-List status of least concern), are highly studied in relation to the effectiveness of mitigation interventions and that one-third of salamander species exist in North America, where research effort is concentrated. Frogs (Anura) were underrepresented (70% of studies versus ∼86% of threatened amphibian species), possibly because many threatened frog species exist in less easily accessible tropical locations. Caecilians (Gymnophiona) were only represented by a single study, but this was in proportion to the number of threatened species they represent (only ∼0.6% of all threatened amphibian species).

An underrepresentation of threatened species is concerning because information on the effectiveness of interventions targeting threatened species is urgently required—particularly given substantial declines of bird fauna (Rosenberg et al. 2019) and severe threats to amphibians (Grant et al. 2019). Although it can be challenging to design reliable studies on rare species, where feasible, conservation scientists should prioritize testing the effectiveness of conservation interventions for threatened species. Equally, the absence of some orders from the literature on testing conservation interventions is problematic because functional and ecological differences between taxonomic groups may make generalization of the effectiveness of interventions difficult or inappropriate. Investigating which interventions are likely to be effective in many local contexts is extremely important to prioritize the most important taxonomic gaps in evidence that need to be addressed (Junker et al. 2020).

Types of bias that may complicate the process of evidence synthesis were also present. For example, studies with more reliable designs (e.g., RCT or BACI) tended to be strongly concentrated in North America and Europe (particularly the United Kingdom) relative to less reliable designs (e.g., BA, CI, and after designs) (Figs. 1 & 2). Combined geographic and study design bias has not been found previously (e.g., Burivalova et al. [2019] did not find patterns across continents in the tropical forest conservation literature) and suggests that not only are studies lacking outside North America and Europe, but also that the few studies that do exist outside these regions are likely to be of low reliability (Christie et al. 2019). This may be because studies conducted outside North America and Europe face greater constraints (e.g., logistical, funding, and time constraints) on the types of study design they can use when assessing the effectiveness of conservation actions. Therefore, funders, journals, and researchers need to facilitate tests of conservation interventions using reliable study designs in these underrepresented regions and the publication of their results.

Amphibian and bird studies used a variety of metrics to quantify the effectiveness of the same intervention. Although using several metrics may improve understanding of the overall effectiveness of an intervention, too many could make evidence hard to synthesize in systematic reviews and meta-analyses (by reducing the number of directly comparable studies) and difficult to interpret for decision-makers (Christie et al. 2020). This highlights the need for greater standardization of the sets of metrics used to assess the effectiveness of certain interventions (Capmourteres & Anand 2016; McQuatters-Gollop et al. 2019) to help make studies more directly comparable (Christie et al. 2020).

The gaps and biases we found in the literature on the effectiveness of conservation interventions represents a serious problem for the field of conservation. Although we could only analyze the literature up until 2012 for amphibians and 2011 for birds, these gaps and biases are still likely to persist. However, with limited resources conservation science cannot afford to allocate research effort inefficiently. Our results are therefore extremely important for determining where future research effort on testing the effectiveness of conservation interventions should be invested. Future studies should not only focus on testing conservation interventions on the poorly represented threatened taxa, regions, and biomes we identified, using reliable study designs where possible, but also on other poorly represented taxa that Conservation Evidence is beginning to, or has yet to, summarize the evidence on (e.g., plants, insects, and reptiles). Future work could also identify whether there are system- or species-specific interventions that are not included in the Conservation Evidence database, particularly, in relatively poorly studied regions. Interventions are defined by an advisory board before systematic literature searches occur, but are often updated and reframed when studies are found that mention or test additional interventions—listed interventions therefore reflect those described in the conservation literature. While possible bias in interventions does not affect the inclusion of studies in the database (because studies are included in the database if they quantitatively test any conservation intervention), identifying possible interventions that are not listed at www.conservationevidence.com would be useful to prioritize the testing of future interventions, particularly for underrepresented regions or taxa. This work would also benefit from a more systematic, hierarchical classification system for describing interventions.

Future work is needed to identify specific research priorities for testing conservation interventions for taxonomic groups other than amphibians and birds, although the broad biases we identified here are likely to apply to other taxa. We hope that by addressing geographic and taxonomic biases in the evidence base for conservation we can ensure more relevant evidence-based recommendations can be made to decision-makers. Similarly, addressing the geographic bias in the use of reliable study designs, and in the variability in the types of metrics used in studies, will help evidence synthesis become more efficient. A more complete, reliable, and standardized evidence base will enable conservation to become more evidence based and, ultimately, more effective.

Acknowledgments

We thank A.-C. Mupepele for useful comments on the manuscript and all past and present members of the Conservation Evidence project. T.A. was supported by the Grantham Foundation for the Protection of the Environment, the Kenneth Miller Trust, and the Australian Research Council Future Fellowship (FT180100354); W.J.S., P.A.M., R.K.S., C.F.R.W., S.O.P., and G.E.S. were supported by Arcadia and The David and Claudia Harding Foundation; B.I.S. and A.P.C. were supported by the Natural Environment Research Council as part of the Cambridge Earth System Science NERC DTP [NE/L002507/1]. B.I.S. was also supported by the Natural Environment Research Council [NE/S001395/1] and by a Royal Commission for the Exhibition of 1851 Research Fellowship.