What works in tropical forest conservation, and what does not: Effectiveness of four strategies in terms of environmental, social, and economic outcomes

Tropical forests and their biodiversity are disappearing, despite decades of conservation efforts. Are we now in a position to understand whether some conservation strategies work better while others consistently fail in protecting tropical forests? We searched the literature to evaluate four mainstream strategies (forest certification and reduced impact logging, payments for ecosystem services, protected areas, community forest management) in terms of 35 environmental, social, and economic metrics. We evaluated whether applying the strategy improved, left unchanged, or worsened the conservation metrics and we created an interactive platform to view the data. We concluded that (a) the scientific literature on the effectiveness of conservation strategies in tropical forests is still vastly inadequate, due to poor design, lack of scope, and too few examples; (b) the effects of conservation on biodiversity and the economic outcomes of conservation are particularly understudied; and (c) all strategies fail at least some of the times, but all of them succeed at least some times. Our recommendation is that each new instance of implementing a given strategy should consider in detail, at the very least, the negative evidence on the given strategy, in order to avoid repeating the same mistakes. We introduce an interactive, dynamic platform to host various types of conservation effectiveness evidence.

primary or intact tropical forests have already disappeared completely (Potapov et al., 2017).
It is of planetary importance that we understand and predict the effectiveness of different conservation strategies in specific contexts to protect the remaining tropical forests. Financial resources for conservation are limited, pointing to the need for evidence-based conservation to maximize conservation success Sutherland, Pullin, Dolman, & Knight, 2004). Simultaneously, it is important to clearly communicate existing scientific results to the relevant conservation decision makers and practitioners (Sutherland & Wordley, 2017).
The main roadblocks in understanding the effectiveness of different types of conservation strategies are rooted in the complexity of the socio-economic systems in which conservation is implemented, imperfect implementation of conservation measures, lack of funding for appropriate evaluation, and the near impossibility of doing true experiments (because of large scale and complex conditions), among others (Ferraro & Pattanayak, 2006;Kapos et al., 2008;Pullin & Knight, 2009;Romero et al., 2013;Salafsky, Margoluis, Redford, & Robinson, 2002;Sutherland & Wordley, 2017). Further, practitioners often have a hard time accessing existing results when they are hidden by paywalls, scattered across the conservation-effectiveness literature, or presented in highly technical language. Finally, it can be challenging to reconcile the oftentimes contradictory evidence to make a truly informed decision. These barriers between conservation practice and science mean that conservation organizations too frequently rely on anecdotal evidence, intuitive understanding, personal preferences, and strong personalities advocating for specific approaches and interventions (Redford, Padoch, & Sunderland, 2013).
In an effort to make the available scientific evidence on the effectiveness of tropical forest conservation more accessible to conservation practitioners and decision makers, as well as donors, industry, governments, and the public, we created an online, interactive, nontechnical visualization of evidence on the conservation effectiveness of four mainstream tropical forest conservation strategies: (a) forest certification and reduced impact logging (FSC-RIL), (b) payments for ecosystem services (PES), (c) protected areas (PAs), and (d) community forest management (CFM).
On the website, practitioners can (a) get a quick overview of the available evidence regarding a particular conservation strategy overall, on a country-by-country basis, or specifically for each of the 35 featured conservation metrics; (b) at a glance see whether and where there is evidence of a given conservation strategy being associated with improvements, no change, or harm in terms of specific conservation outcomes; (c) explore actual outcomes of individual studies via brief (1-2 sentences) nontechnical summaries that link to the full, peer-reviewed articles or technical reports; and (d) sort the data by filtering studies of different quality of evidence (e.g., quasi-experimental studies, systematic reviews, case reports).
Importantly, we did not seek to emulate a systematic review or a systematic evidence map (Pullin & Stewart, 2006). We believe that both systematic reviews and evidence maps, as defined for example by the Collaboration for Environmental Evidence (http://www.environmentalevidence. org/), have an important role to play in conservation and sustainable management. However, they do not fulfill our goals of presenting different types of evidence on a multitude of outcomes at the same time, and they do not allow users to immediately see the main direction (positive, neutral, negative) of each outcome. Evidence maps typically show only where evidence is available, but not what the evidence shows (McKinnon et al., 2016;McKinnon, Cheng, Garside, Masuda, & Miller, 2015;Puri, Nath, Bhatia, & Glew, 2016). It is also important to note that our evidence collections are not exhaustive, but that new studies can be added to our open-source platform by researchers, thus preventing the platform from becoming obsolete. We elaborate on the specific differences between our approach and systematic review and evidence maps, as well as on their advantages and disadvantages in the methods section. Here, we summarize the main findings on the effectiveness of the four tropical forest conservation strategies.

| Literature sampling
To populate our database, we carried out a literature review on the effectiveness of four mainstream conservation strategies: (a) FSC-RIL; (b) PES; (c) strictly PAs (IUCN category I, II, and III); and (d) CFM (including, but not restricted to areas designated as IUCN categories V and VI). We define each strategy in full in the Supporting Information (SI Text 1). We systematically searched for peerreviewed studies or technical reports (see inclusion criteria) that looked at the effectiveness of one or more of the four strategies by either comparing sites where a given strategy had been implemented to other sites where no intervention had occurred, by comparing conditions in an area before and after implementation of a given strategy, with varying degree of rigor in terms of selecting controls and accounting for confounding variables (see evidence types). We did not include studies that compared one intervention to another. Studies comparing multiple interventions to each other, while valuable, were too few in number to assess. We considered all outcomes of each intervention falling into three broad categories-environmental, social, and economic (details in data extraction).
For each conservation strategy, we used a separate search string on Google Scholar (https://scholar.google.com/) that combined the words: impact OR effect* AND (environmental OR social OR economic) AND tropical OR Asia OR Africa OR South America AND strategy, where the word strategy was replaced each time with the specific conservation strategy we researched, namely (a) forest certification OR sustainable forestry OR reduced impact logging; (b) payment for environmental services OR payment for ecosystem service; (c) CFM OR community based forest management OR community co-management OR community joined management OR participatory forest management; and (d) PA OR national park. An important caveat is that our search may have missed similar interventions described with different terminology.
For each conservation strategy, we read the first 1,000 Google Scholar search results, taking into consideration the title of the study (i.e., our database is not exhaustive). For titles that appeared relevant, we then read the abstract, and proceeded to read the full publication if the abstract indicated the article was relevant to our research objectives. We included the publication in the database if it met the inclusion criteria listed below. A caveat of this approach is that Google Scholar does not fully disclose the criteria by which search results are ranked, and such ranking could vary over time, as certain publications gain more citations, however it does not vary for different users (see also limitations below).

| Inclusion criteria
We included only peer-reviewed literature, and a small number of internally peer-reviewed technical reports by research institutes and think tanks, where the likelihood of bias was low. For example, we included technical reports on forest certification by CIFOR, but excluded technical reports on forest certification by WWF, as WWF was one of the founding organizations of the Forest Stewardship Council. We excluded opinion pieces and modeling studies, with the exception of studies that modeled a counterfactual scenario, based on empirically obtained parameters, in order to estimate the effect of the intervention. Whereas simulation model studies are valuables, it was beyond the scope of our study to include them and future work is needed in this area. We included only studies where the methodology and sampling were fully explained. We included studies only from countries that fall in the tropics (with the exception of Australia, which we excluded as only a small part falls in the tropics). Finally, in order for a study to be included, it had to provide a minimum set of extractable information (see below).

| Data extraction
From each study, we extracted several pieces of information: first author, year of publication, title, conservation strategy, control, continent, country, method, evidence type (see below), broad category of the outcome measured (environmental, social, economic), narrow category of the outcome measured (one of the 35 identified variables, Supporting Information Table S1), conclusions (short, nontechnical description of the main outcome), and "valence" of the outcome (positive = implementing the conservation intervention was better than the control in terms of the studied variable; neutral = the conservation intervention did not significantly change the studied outcome, which could be due to low statistical power or a true no effect; negative = the conservation intervention was associated with a worse outcome than the control). All of these were prerequisites for the publication to be used in the database.
Data were extracted from each study by a primary researcher, and then 20% of the data points were checked by a secondary researcher. In the majority of cases, data extraction was straightforward, and it was trivial to assign outcomes as positive, neutral, or negative. In a minority of cases, this decision was nontrivial and those studies were revisited by the primary researcher and independently extracted by the secondary researcher. The outcomes were then compared and agreement reached. We specified any potential alternative interpretations in the verbal conclusions for these outcomes, all of which are available in the online database.

| Evidence types
Each study was assigned to a category of evidence type (Table 1). This classification was developed for this project, as there currently exists no widely accepted evidenceclassification scheme in conservation science (Game et al., 2018). It is not hierarchical, and the types of evidence we identified are not entirely exclusive (for example, a study could be a meta-analysis of randomized controlled trials). The classification does not consider sample size, study duration, or geographic scope. It focuses on whether a particular study has the potential to suggest a mechanism, reveal a correlation, or suggest causation, and whether the study is generalizable (see Supporting Information S2).

| Controls
For the majority of outcomes, apart from those classified as case reports (Table S1, Supporting Information S2), the studies compare outcomes of a conservation strategy to a control or a counterfactual scenario. The control is different for each of the conservation strategies. PAs and CFM are typically compared to a similar baseline-a forest that is not formally protected or managed, and is under a so-called "open-access" regime, which often means that, whereas it is technically owned by the state, there is little to no formal management of the site. PES are typically compared to privately or publicly owned land that is not enrolled in the program. Forest certification and reduced impact logging are compared to industrial logging concessions that are not certified and are conventionally managed for timber extraction. Therefore, when comparing the effectiveness of different conservations strategies, we are comparing their effectiveness in terms of achieving the specific goals of each strategy, rather than to each other.

| Interactive platform
We constructed the visualizations, available here (https:// www.conservationeffectiveness.org/) in a way that lets the user filter the evidence by country, thematic group of outcomes, and evidence type, as well as explore the nontechnical summaries of each finding and navigate directly to the underlying literature. Users are also able to log on and add a new outcome, which, if approved by the editors, are then visible on the visualization; however, the results described here are based only on the initial literature search carried out in 2017.
2.7 | Limitations and comparison with systematic reviews and evidence maps 2.7.1 | Exhaustive approach versus sampling Our database is a sample of the literature and therefore it is not exhaustive. This is a key difference from most systematic reviews or systematic evidence maps, which often aim to be as complete as possible. More evidence is generally better than less, and so being exhaustive is an important advantage of a systematic review. This, however, comes at a high cost per systematic review (Sutherland & Wordley, 2018). Our method involves systematic sampling of the literature, well beyond a point where the number of new relevant studies per Google Scholar search results asymptotes. The advantage of our approach is being able to cover more topics with the same resources, compared to a systematic review, while simultaneously not being restricted by an a priori selection of journals, such as in the Conservation Evidence project (Sutherland & Wordley, 2018).

| Static versus dynamic
Even the most exhaustive systematic review becomes incomplete as soon as a new publication appears. The advantage of our approach, over a systematic review and other initiatives relying on repeated efforts (Sutherland & Wordley, 2018), is that it is explicitly designed for studies to be added to the database and online visualization as they become published, by an unlimited number of volunteering researchers. An outstanding question, that must be addressed with future research, regards the new type of bias that will be introduced by individual researchers adding new studies to our database.

| Biases
Any sampling may introduce a bias. Our literature sample is biased by the way Google Scholar ranks studies in terms of relevance; a criteria that is not transparently explained by Google. It is important to note, however, that such bias, resulting from not including all studies, is only one out of the many known and unknown biases in scientific research Note. Case report describes a description of a case where a strategy was implemented, without a proper control. Case-control I has a control, either in time or space. Case-control II has a control, and takes into account at least some confounding variables in the analysis. Quasi-experimental study has a rigorous study design and control selection, it considers a counterfactual scenario. Randomized controlled trial assigns randomly treatment and control. Meta-analysis performs an analysis of an overall effect size across multiple studies. Systematic review collects and summarizes all available studies in an objective, systematic way. For full description, see Supporting Information.
a Yes in a system where all important confounders are known and taken into account. b Depending on the evidence type of individual studies included. and publication; some of which can be addressed, others cannot. Reviews, including systematic ones, can suffer from different types of biases, such as publication bias (certain types of studies are more likely to be accepted for publication than others), geographic bias (certain regions are more likely to be studied in the first place), among others. Such biases can be quantified, but unlikely to be truly eliminated, regardless of whether a review is exhaustive or not.

| Variable types of evidence
Unfortunately, for many of the strategies commonly used in tropical forest conservation, there is as yet an insufficient rigorous evidence base. Systematic reviews on these subjects often exclude the majority of evidence available as it does not meet the predetermined standards, typically requiring study design that can demonstrate causation, or true impact (Bowler et al., 2010;McKinnon et al., 2016;Puri et al., 2016;Samii, Lisiecki, Kulkarni, & Chavis, 2014). Such excluded evidence is typically less sophisticated in terms of study design and statistical analysis, yet, we argue, can also be important: conservation practitioners and governments have to make urgent decisions, based on the evidence that is available, incorporating new studies as they appear (Diamond, 1986). A systematic review that adopts strict impact evaluation criteria has the potential to estimate the true overall impact of an intervention, should sufficient evidence be available. Our approach may provide an insight into potential mechanisms at play when evidence with rigorous design is not available.

| Drawing conclusions versus showing evidence
Systematic reviews are typically carried out to answer a very specific, narrowly defined question, often drafted together with the most important stakeholders (Pullin & Stewart, 2006). Our approach, similarly to the Conservation Evidence approach (Sutherland & Wordley, 2018), enables answering a variety of questions with the same dataset. We designed the interactive visualization to encourage the generation of new questions, making evidence-based decisions, and investigating alternative conservation strategy options.

| Choice of presenting results visually
Any visualization of a complex and heterogeneous dataset comes with a risk of being misinterpreted. An important difference between our approach and that of a systematic evidence map (McKinnon et al., 2015), is that we attempt to represent visually what different pieces of evidence show individually, in terms of the outcome being better, worse, or no different than a control. Systematic evidence maps typically only show where there is evidence, but not what it shows. The user of systematic evidence maps can then read the relevant studies, and this makes for a valuable resource for researchers and funding bodies when deciding where major evidence gaps lie. However, we argue that conservation practitioners and decision makers typically do not have the time, resources, or technical expertise to access and evaluate the technical studies themselves. Our approach attempts to bring the evidence closer to conservation practitioners by showing the evidence in an increasingly detailed view in the online visualization-(a) a map of all evidence, (b) positive, negative, and neutral evidence divided into three thematic groups, (c) evidence further divided into 35 narrower categories and/or evidence type, (d) nontechnical short summaries of each finding, (e) a link to the original technical publication.
For correct interpretation, the following caveats must be taken into account when interpreting the evidence: Data points do not have equal weight and cannot be considered as such. Overall effect sizes cannot be calculated and vote counting is not possible, as individual data points are not independent-one study could yield multiple pieces of evidence. The individual pieces of evidence are not of equal quality-some can show causation, but most can demonstrate only a correlation, or a potential causal mechanism. Future research should investigate how conservation practitioners use evidence visualizations.
Additionally, our evidence classification does not reflect sample size, study duration, and scope. By having evaluated individual outcomes as positive, negative, or neutral, we could not avoid adding a degree of subjective judgment-in several cases, different readers of the papers might have rendered different judgments regarding outcomes. We tried to minimize this effect by having at least 20% of data points evaluated by a second researcher. However, the way in which we have classified and presented the data will allow re-evaluations, future updates, and inclusion of additional evidence.
To summarize, we do not see our approach as competing with or replacing systematic reviews or other approaches for evidence synthesis. Instead, our approach fills a specific niche in evidence-based conservation, namely bridging the gap between conservation science and practice.

| RESULTS AND DISCUSSION
From the 4,000 Google Scholar search results we systematically evaluated, 161 publications fit our inclusion criteria. From these, we extracted 570 data points (Figures 1 and 2, online visualizations). Each data point represents an outcome of one of the four conservation strategies (FSC-RIL -187 outcomes, PES -132 outcomes, PAs -124 outcomes, CFM -127 outcomes).

| Environmental outcomes
Deforestation and forest degradation was the single most frequently measured variable (120 out of 248 data points), which is understandable, as reducing deforestation is one of the main goals of all four conservation strategies. There was evidence of at least some success by all strategies; however, the ways in which conservation scientists measure deforestation and degradation vary considerably in terms of basic definitions (Ghazoul, Burivalova, Garcia-Ulloa, & King, 2015), methods, as well as choice of controls. As a result, it was impossible to calculate an overall effect size per strategy ( Figure 2). 1 We found comparatively less evidence on outcomes for biodiversity within our literature sample (30 data points, of which 22 related to animal diversity and 8 to plant diversity). This relative scarcity of studies reporting biodiversity outcomes may be due to two factors. First, many studies and conservation projects specifically assume that forest cover is a good proxy for biodiversity. However, this assumption is not always valid (Burivalova, Game, & Butler, 2019;Redford, 1992;Robinson, Redford, & Bennett, 1999;Wilkie, Bennett, Peres, & Cunningham, 2011)-hunting is one of the most important forest uses, and in itself was one of the measured outcomes across conservation strategies Scaling differs between strategies. Metaanalyses and systematic reviews that specify the studied countries count towards the size of individual bubbles (13 data points on illegal hunting, logging, and mining). Second, biodiversity is far more difficult and expensive to measure than deforestation, which can be relatively reliably estimated from satellite imagery . The remaining environmental variables, such as water regulation and erosion prevention (10 data points), carbon stock and emissions (14 data points), or canopy loss and gap size (16 data points) were measured even less frequently across the four strategies.

| Forest certification and reduced impact logging
The evidence we found on deforestation in certified concessions ( Figure S1), while very limited, is rigorous, and shows that certification leads to a reduction in deforestation in Indonesia (even though it did not reduce fire occurrence), but did not cause a change in rates of deforestation in Mexico (Blackman, Goff, & Rivera Planter, 2015;Miteva, Loucks, & Pattanayak, 2015). Some (legal) forest loss is inevitable even in certified concessions-trees have to be cleared for the construction of roads, log landings, and logging camps-we found no indication of what an acceptable or permitted rate of forest loss might be for an FSC certified concession. This is an important gap in the literature.
We found abundant evidence that FSC-RIL decreases certain aspects of tropical forest degradation-that is, it is associated with lower road and skid trail densities (Asner, Keller, Pereira, Zweede, & Silva, 2004;Medjibe, Putz, & Romero, 2013), smaller canopy loss and gap size (Asner, Keller, Pereira, & Zweede, 2002), and less collateral damage and ground disturbance (Pinard, Putz, & Tay, 2000;Putz & Pinard, 1993). This evidence does not consistently account for confounding variables, such as timber stock or logging intensity. Several of the studies that do account for confounding factors, primarily logging intensity, show that the F I G U R E 2 Overview of the evidence on the outcomes of four different conservation strategies (columns) in terms of environmental, social, and economic outcomes (rows). Red squares, negative outcomes (i.e., intervention worse than control); yellow, neutral; green, positive outcomes. Individual squares do not have equal value, cannot be cancelled out, and summarized into overall effect sizes, see limitations mitigating effect of RIL may be low or even nonexistent at higher logging intensities (Griscom, Ellis, & Putz, 2014;Martin, Newton, Pfeifer, Khoo, & Bullock, 2015). The effect of RIL on biodiversity follows a similar pattern (Bicknell, Struebig, Edwards, & Davies, 2014;Burivalova, Lee, Giam, Wilcove, & Koh, 2015): majority of data points show that animal species fare better in certified or RIL forests than in conventionally managed concessions. However, once logging intensity is taken into account, the improvement in terms of species richness and abundance becomes smaller. This suggests that some of the other restrictions associated with certification, such as enforcing legal logging intensity limits, and set-asides on steep slopes or along rivers (Imai et al., 2009) may be especially important for biodiversity.
We encountered only one data point comparing the amount of hunting in certified forests to conventionally managed ones, and it showed little difference between the two treatments (Cerutti et al., 2017). We consider this a major research gap, as hunting often goes hand in hand with timber extraction (Robinson et al., 1999).

| Payments for ecosystem services
We found 17 data points evaluating the impact of PES on deforestation and forest degradation, all of which showed that PES was associated with either a decline or no significant change in deforestation ( Figure S2). Even if we consider outcomes only from quasi-experimental studies (Study III evidence type, Table 1), we find more cases of positive change than of no change. The only Randomized Control Trial included in our database dealt with a PES scheme and found a decline in forest loss (Jayachandran et al., 2017); however, there has been at least one other published since (Pynegar, Jones, Gibbons, & Asquith, 2018).
While we were not able to calculate an overall effect size for reduced deforestation, several studies found that the PES projects achieved low additionality (Asquith, Vargas, & Wunder, 2008;Honey-Rosés, Baylis, & Ramirez, 2011;Robalino & Pfaff, 2013). This was not due to participants breaking the PES contracts, or because of abundant leakage (displacement of deforestation to areas that were not enrolled in PES). Instead, the authors found that targeting was difficult: it was hard or not socially desirable to enroll only those participants who would deforest their patch of land with high certainty in the absence of payments, in other words, some payments were given to people who would have not deforested their land anyway (Alix-Garcia, Shapiro, & Sims, 2012;Asquith et al., 2008;Robalino & Pfaff, 2013).
Water quality was the second most frequently studied environmental outcome, and in all cases but one, studies found an improvement in water quality, quantity, and erosion prevention (Grieg-Gran, Porras, & Wunder, 2005;Gutiérrez Rodríguez et al., 2016). The one exception found no change, and this was in an area where water quality had already been perceived to be good before PES implementation (Kosoy, Martinez-Tuna, Muradian, & Martinez-Alier, 2007).
Biodiversity was rarely the main focus of the projects we evaluated, and there were few data points (eight for animal and plant diversity combined), out of which four were positive, one neutral, and three negative. All the documented instances of PES resulting in declining animal and tree diversity were from China, a country with the world's largest PES program (Gutiérrez Rodríguez et al., 2016;Hua et al., 2016). Most of the reforestation in China has happened through monoculture plantations, which studies have found to be worse in some instances for bird and bee diversity than the replaced agricultural fields (Hua et al., 2016). PES is not well-studied from a biodiversity perspective in South and Central America, compared to other outcomes. We found some evidence that PES did not change the levels of illegal hunting and logging, both of which would have direct consequences for biodiversity (Asquith et al., 2008;Gross-Camp, Martin, McGuire, Kebede, & Munyarukaza, 2012;Hegde & Bull, 2011).

| Protected areas
We found 72 data points related to whether the establishment of PAs reduced deforestation or forest degradation within the PA's boundaries ( Figure S3). Regardless of evidence types, most outcomes found that PAs were associated with reduced deforestation. One quasi-experimental study in a region of Mexico found a case of PAs leading to an increase in forest loss (Blackman, 2015), and several case reports found that deforestation within the PA either increased through time, or exceeded the deforestation rate in the buffer zone once the buffer zone was almost completely deforested (Curran, 2004;Heino et al., 2015;Htun, Mizoue, Kajisa, & Yoshida, 2010). Several rigorously designed studies found that PAs had no significant impact on deforestation inside the PA (Baylis et al., 2016;Blackman, 2015;Brandt, Nolte, & Agrawal, 2016), regardless of the IUCN category of the PA (Nagendra, 2008).
Whereas some of the negative outcomes document a true inefficacy of PAs, others suggest a low overall deforestation pressure as a reason for the currently negligible impact of the PAs (Barber, Cochrane, Souza, & Veríssimo, 2012;Pfeifer et al., 2012). The studies that found PAs to be effective in reducing deforestation found positive effects of a wide range of magnitude, with reductions in deforestation by over 70% (Blackman, Pfaff, & Robalino, 2015).
We found little data on biodiversity: two studies found that animal biodiversity is better off inside a PA than outside, but one found that 80% of reserves experienced a decline in biodiversity value over time, suggesting low effectiveness (Coetzee, Gaston, & Chown, 2014;Lee, Sodhi, & Prawiradilaga, 2007). The two studies looking at plant diversity found little impact (Coetzee et al., 2014;Paré, Tigabu, Savadogo, Odén, & Ouadba, 2010).
We found only one study that measured changes in the levels of illegal hunting and logging inside the PA, activities with direct consequences for biodiversity (Bruner, 2001). This meta-study found that whereas PAs did reduce illegal logging, less than two thirds of PAs were in a better condition in terms of hunting than their surroundings (Bruner, 2001).

| Community forest management
We found evidence of both positive effects (Bowler et al., 2012;Fortmann, Sohngen, & Southgate, 2017;Pelletier, Gélinas, & Skutsch, 2016) and no effects (Heltberg, 2001;Pelletier et al., 2016;Rasolofoson, Ferraro, Jenkins, & Jones, 2015) of CFM on deforestation within the delineated forests, but no evidence of worsening deforestation ( Figure S4). If we only consider quasi-experimental studies (Study III evidence type, Table 1), the number of neutral outcomes exceeds the number of positive outcomes (Rasolofoson et al., 2015;Santika et al., 2017). The systematic reviews that we included in our database found more evidence of CFM reducing forest degradation (defined differently by individual studies) rates rather than deforestation rates, which stayed similar (Bowler et al., 2012;Pelletier et al., 2016).
Several studies measured "forest condition" of the community-managed forests by asking participants how they perceived change in forest condition (Schreckenberg & Luttrell, 2009). Whereas this is important-if the participants are not satisfied with the outcomes, they might be less supportive of any conservation projects (Kassa et al., 2009)-it does not provide insight into the impact of CFM on deforestation.
A pan-tropical systematic review of plant diversity found no overall change in tree richness or other measures of plant diversity in community forests relative to no management (Bowler et al., 2012) and an additional case study from Ethiopia found a positive outcome for animal diversity (Kassa et al., 2009).
Illegal hunting, logging, and mining all threaten biodiversity, and levels of these activities may therefore be considered a proxy for biodiversity loss. Multiple data points are available for this variable. Some studies show improvements in controlling or eliminating illegal hunting and logging, frequently through the communities being better able to enforce boundaries and exclude outsiders (Beauchamp & Ingram, 2011;Blomley et al., 2008). However, one case showed that imposing strict rules on extraction of resources forced people to shift their extractive activities to a neighboring forest (Schreckenberg & Luttrell, 2009).

| Social and economic outcomes
Community well-being, which combined various measures of poverty, was the most frequently measured variable (n = 49), followed by measures of equity, equality, and marginalization (n = 37). Certification and RIL was the only relatively well-studied strategy in terms of its economic outcomes. Across strategies, profit was the most commonly measured outcome.

| Forest certification and reduced impact logging
Most of the social outcomes of tropical forest certification we found ( Figure S1) showed improvements or no change in community well-being and livelihoods or in the living and working conditions of employees (Bacha & Rodriguez, 2007;Cerutti et al., 2017;Miteva et al., 2015). We emphasize that the comparisons were made with social conditions in and around conventional logging concessions, rather than in forests with no logging. The only quasi-experimental study we found showed that FSC certification in Indonesia led to decreased air pollution, lower malnourishment, and less dependency on firewood, but no change in health care centers or street lighting in villages (Miteva et al., 2015). The tropical timber industry suffers from high levels of corruption, but we found no studies on the impact of certification on corruption (Tacconi, Obidzinski, & Smith, 2004). However, we found evidence that FSC certification is associated with a higher compliance with existing labor laws and other regulations (Bacha & Rodriguez, 2007;Tay, Healey, & Price, 2002).
Evidence on the economic outcomes of FSC-RIL was split almost equally between positive and negative outcomes. Whereas in the other three conservations strategies the economic outcomes pertain mostly to communities, in the case of FSC-RIL, the economic outcomes relate to the logging companies that get certified or use RIL. There were some positive outcomes in terms of lower skidding costs, higher timber stock, improved forestry concession management, such as having a clear management plan (Araujo, Kant, & Couto, 2009;Holmes, 2015;van der Hout, 1999). Certification also brought a higher selling price for the certified timber (price premium) and improved market access for certain tree species; however, the difference was in general not large enough to make up for the higher prelogging costs (such as mapping all trees) and lower worker productivity (Holmes, 2015;Pinard et al., 2000;Tay et al., 2002). The overall profitability of FSC-RIL was found to be lower than the profitability of conventional logging in more than half of the outcomes we found (Dwiprabowo, Grulois, Sist, & Kartawinata, 2002;Saharudin, Brodie, & Sessions, 1999;Simula, Astana, Ishmael, Santana, & Schmidt, 2004).

| Payments for ecosystem services
Most outcomes we found were various measures of equity, equality, and marginalization, investigating whether introducing PES exacerbated or alleviated the existing inequalities within the community (26 out of 62 outcomes, Figure S2). Most often, including in the one quasiexperimental study, there was no significant change in equity, equality or marginalization associated with PES (Asquith et al., 2008;Brimont, Ezzine-de-Blas, Karsenty, & Toulon, 2015;Corbera, Brown, & Adger, 2007;Hegde & Bull, 2011), with two systematic reviews also concluding there was no significant change (Gutiérrez Rodríguez et al., 2016;Samii et al., 2014).
It might be difficult for many of the current PES schemes to improve equality substantially across the whole community. Many communities in the project areas include people who do not own any land, and therefore by design they cannot directly participate in a PES project (Lopa et al., 2011), but see a study from Bolivia (Asquith et al., 2008). Second, even when payments are delivered in full, in some cases they contribute only a small percentage to the total annual budget of a family and are therefore unlikely to reverse its existing socio-economic status (Corbera, González, & Brown, 2009).
The outcomes for community well-being and livelihoods were mostly neutral, despite payments often being paid out in full (Arriagada, Sills, Ferraro, & Pattanayak, 2015;Grieg-Gran et al., 2005;Gross-Camp et al., 2012). Additionally, several studies found that opportunity costs were often not met or not perceived to have been met (only one positive outcome for opportunity costs) (Corbera, Kosoy, & Martinez Tuna, 2007;Kosoy et al., 2007;Newton, Nichols, Endo, & Peres, 2012). At the same time, we found little evidence that PES made families and communities worse off than those not participating in the program (Asquith, Vargas Rios, & Smith, 2002;Gutiérrez Rodríguez et al., 2016).
Finally, several studies, including one systematic review from China (Gutiérrez Rodríguez et al., 2016) found that land tenure security improved with the implementation of PES projects (Börner et al., 2013;Grieg-Gran et al., 2005;Locatelli, Rojas, & Salinas, 2008) and sometimes secure land tenure was an important reason for participants to re-enroll their land in the program, even if they did not perceive financial benefits from the project (Arriagada et al., 2015).
In terms of access of communities to forest land, PAs had mostly negative outcomes (Rantala et al., 2013;Torri, 2011;Vedeld et al., 2012), and they also tended to exacerbate human-wildlife conflict: all outcomes we found on humanwildlife conflict (most of which are from India) were negative. There were more conflicts closer to a PA than further away (Karanth, Gopalaswamy, Prasad, & Dasgupta, 2013;Karanth & Nepal, 2011;Ogra & Badola, 2008). This may reflect the fact that the animal species causing the conflicts are more abundant in and around PAs than far from them.
Several studies measured public awareness of PAs by the people in or near the PAs, and most outcomes showed a high level of awareness (Ferreira & Freire, 2009;Xu, Chen, Lu, & Fu, 2006).
Overall, given the large number of PAs containing tropical forests, we found little rigorous evidence on the social outcomes of PAs. Further, there were almost no studies that quantified the economic losses (or gains) for the local communities stemming from PAs. The four outcomes we found documented negative or no change in economic benefits and profits the communities made (Karanth & Nepal, 2011;Mackenzie, 2012;Vedeld et al., 2012). The socio-economic outcomes of PAs are clearly in a need of further, rigorous study.

| Community Forest management
The evidence on social outcomes of CFM showed a wide range of outcomes ( Figure S4). In terms of community wellbeing, CFM seemed to bring either improvements or no change. The only quasi-experimental study found no significant change in per capita consumption expenditures in Madagascar as a result of CFM (Rasolofoson et al., 2016). Similarly, the empowerment and participation of communities in decision making and management either improved or remained the same (Coleman & Fleischman, 2012;Pelletier et al., 2016;Schreckenberg & Luttrell, 2009).
In terms of equality, equity (in the sense of giving everyone what they need to be successful, rather than in an economic sense) and marginalization, we found rigorous evidence indicating improvements, no change, as well as worsening due to CFM (Coleman & Fleischman, 2012;Jumbe & Angelsen, 2006;Pelletier et al., 2016). For example, as a result of CFM, wealth inequality decreased in Mexico, did not change in Bolivia and Kenya, and grew worse in Uganda (Coleman & Fleischman, 2012).
Economic outcomes were even less well represented in our database. Several systematic reviews found that overall, CFM did not improve the economic situation of families (Bowler et al., 2012;Pelletier et al., 2016), even though there were several additional positive case studies in our database (Oyono, 2005;Schreckenberg & Luttrell, 2009). A quasi-experimental study found that household income from forestry was lower in CFM areas, or that it did not change (Jumbe & Angelsen, 2006).

| CONCLUSIONS
We assembled and analyzed 161 studies assessing 570 environmental, social, and economic outcomes of four different tropical forest conservation strategies. For all four strategies, we found a lack of rigorous studies assessing a wide range of real-world conservation examples. As noted by many other researchers (Bowler et al., 2012;McKinnon et al., 2016;Pelletier et al., 2016;Sutherland & Wordley, 2017), the conservation literature does a poor job of assessing the effectiveness of tropical forest conservation strategies, due to the poor design of individual studies, a lack of thematic scope, geographical bias, and just too few studies.
Regardless of the limitations of our work and of the conservation effectiveness literature, we are able to draw several conclusions. We found that no strategies worked all the time and in terms of all outcomes-there are no silver bullets in tropical forest conservation. All strategies fail at least some of the time-there are at least some negative outcomes documented by the scientific literature. These should be taken as "red flags" when new conservation interventions are planned in similar contexts. At the same time, all strategies succeed at least in some cases, and these can serve as positive examples. We recommend that each new instance of implementing a given strategy should consider in detail, at the very least: (a) systematic reviews on the given strategy, (b) all negative evidence on the given strategy, (c) all geographically close evidence, and (d) once conservation goals are specified, all evidence on the particular outcomes across different strategies, in order to consider alternatives. We hope that our online platform will make this task easier, together with considering the caveats of each piece of evidence.
We are concerned that biodiversity outcomes of conservation strategies appear to be especially understudied, across all strategies. This is dangerous, as hunting, climate change, and forest degradation are major threats to species survival, and so forest cover cannot always be used as a reliable proxy for biodiversity. We call for a large number of well-designed studies that would fill in all of the glaring thematic, geographic, and scope gaps in the knowledge of the four conservation strategies. However, as biodiversity is irreplaceable, in our opinion, this is the research gap that needs to be filled in most urgently, in order to stem the sixth mass extinction.
Finally, many before us have called for an increasingly evidence-based conservation and for bridging the gap between conservation practice and science (Baylis et al., 2016;Ferraro & Pattanayak, 2006;Laurance, Koster, et al., 2012;Margoulis, Stem, Salafsky, & Brown, 2009;Pullin & Knight, 2001;Sutherland & Wordley, 2017). Yet, progress seems to be slow, and so we believe new approaches in using and communicating about evidence are needed (Sutherland & Wordley, 2018). Here, we have developed an approach of presenting evidence in a visual and nontechnical way, with two goals: (a) making scientific evidence more available, accessible, and engaging to conservation practitioners, and (b) creating an online evidence base that has the potential to be kept up to date by including contributions from the entire conservation science community.

DATA ACCESSIBILITY
All data are accessible at ConservationEffectiveness.org.

ETHICS STATEMENT
This study did not involve any experiments on animal or human subjects.
ENDNOTE 1 There is a notable variability in the way studies report forest cover loss statistics: whereas a reduction in deforestation rate expressed as a percentage point (e.g., 1 percentage point) may be perceived as low, the same outcome expressed as percentage may be perceived as high (e.g., change from a 2% to 1% loss is equal to a 50% reduction in the rate). In order to reduce potential misinterpretation and bias, we recommend reporting outcomes as both percentage and percentage points.