What constitutes a useful measure of protected area effectiveness? A case study of management inputs and protected area impacts in Madagascar

Protected areas are one of the key tools for conserving biodiversity and recent studies have highlighted the positive impact they can have in avoiding habitat conversion. However, the relationship to management actions on the ground is far less studied and we currently do not know which management actions are the most crucial for success. To investigate this, we studied the effectiveness of the protected area network of Madagascar. We estimated the impact of individual protected areas in avoiding deforestation, accounting for confounding factors (elevation, slope, distance to urban centers and infrastructure, and distance to forest edge). We then investigated whether Protected Area Management Effectiveness scores, and their different facets, explained the variation observed. We found that the majority of the analyzed protected areas in Madagascar do reduce deforestation. Protected areas with higher management scores did not perform better in terms of avoiding deforestation. We discuss potential explanations for these results, and how they might influence the validity of current methods for estimating different facets of protected area effectiveness under different deforestation scenarios.


| INTRODUCTION
The establishment of protected areas (PAs) is a widely used policy tool for halting biodiversity loss, and increased PA coverage is a key target of the Convention on Biological Diversity (2018). Considering the focus on PAs and the global commitment to increase their extent and coverage, there is a need to assess their effectiveness in terms of the impact PAs have had, either in avoiding direct biodiversity loss, or avoiding threats to biodiversity (such as land use change; Joppa & Pfaff, 2010a;Pressey, Weeks, & Gurney, 2017). Early analyses of PA effectiveness compared rates of biodiversity loss with surrounding areas (i.e., buffer analyses), but more recently, matching methods have been used (Joppa & Pfaff, 2010a). Matching accounts for confounding factors not related to protection per se (such as remoteness and accessibility) and provides a comparative control scenario, sharing similar characteristics to the PAs, against which to compare changes inside PAs (Andam, Ferraro, Pfaff, Sanchez-Azofeifa, & Robalino, 2008;Stuart, 2010). Studies employing such techniques, have shown that PAs have avoided habitat conversion (Andam et al., 2008;Bowker, De Vos, Ament, & Cumming, 2017;Eklund et al., 2016). However, these studies have also found a large variation in the performance of individual PAs, suggesting that contextual factors such as governance quality and management might be key (Eklund & Cabeza, 2017).
The assumption has been that improved PA management actions on the ground would lead to improved PA impact in conserving habitat and species, but this has rarely been tested . Of the few studies available, those using forest cover change as the outcome measure have generally found no relationship between management inputs and deforestation Nolte & Agrawal, 2013). Contrastingly, the few studies which have used vertebrate species population changes as a measure of effectiveness, in both the marine and terrestrial environments, have found a correlation between measured effectiveness and human capacity and resources Gill et al., 2017;Laurance et al., 2012). Most of these studies have used PA management effectiveness (PAME) evaluations, as proxies for management input and quality which uses an ordinal scoring covering various aspects of PA management . In this study we combine matching approaches with absolute measures and investigate not only the relationship between management effectiveness and impact, but also question the use of matching results in these exercises. We do this using a case-study of Madagascar's PAs, using avoided deforestation as a proxy for PA impact, and management data from PAME evaluations conducted by the National Parks authority. Madagascar merits international attention for its high levels of endemic species, repeatedly having been identified as a top priority for biodiversity conservation globally (Brooks et al., 2006). For these reasons, Madagascar has become a recipient of significant biodiversity aid funding (Miller, Agrawal, & Roberts, 2013). Yet, the country struggles with high poverty levels and an unstable political environment, putting high pressures on the remaining forests, both through illegal high-value wood logging, shifting cultivation for subsistence farming, forest degradation due to charcoal production, overharvesting of vertebrates for bushmeat or trade and, more recently, escalating mining pressures (Allnutt, Asner, Golden, & Powell, 2013;Cabeza, Terraube, Burgas, Temba, & Rakoarijaoana, 2019;Reuter, Randell, Wills, & Sewall, 2016;Scales, 2014).
The PA network has been recently expanded, responding to increased calls for comanagement for multiple objectives and a move away from top-down approaches to conservation (Kull, 2014). This has resulted in a division of PAs into those governed by the state through the parastatal Madagascar National Parks (MNP), and those with shared governance structures, involving nongovernmental organizations and local community associations (Gardner et al., 2018). All this has taken place under challenging governance conditions, with a political coup in 2009, and following political instability (Gardner et al., 2018), with deforestation rates remaining high (Eklund et al., 2016;Mayaux et al., 2013). Despite the pressures, Eklund et al. (2016) found that PAs in Madagascar have reduced deforestation within their borders, with variation in effectiveness between forest types and time periods. In this study, we build on Eklund et al. (2016), but explore the effectiveness at the individual PA level, and the link between management inputs and PA impact in avoiding deforestation.

| Forest loss and avoided deforestation
Forest cover for the years 2005 and 2010 was obtained from layers developed by local conservation institutions (Office National pour l' Environnement et al., 2013). Layers are based on the classification of Landsat TM and ETM+ data with a 30 m × 30 m spatial resolution, see Harper, Steininger, Tucker, Juhn, and Hawkins (2007) for classification details. The forest cover was categorized into the three main forest types (humid, dry, spiny) following the procedure explained in Eklund et al. (2016). The year 2005 was selected as the starting year in order to allow for comparison with the data for management inputs, that is, the PAME data (see below), which was available only from 2005 to 2010, thus preventing inclusion of previous time periods as in Eklund et al. (2016). Using the forest cover of 2005 as the starting point, we determined if a forest pixel had been deforested between 2005 and 2010. Pixels covered by clouds in either 2005 or 2010 were omitted.

| Contextual variables
We included data for the following covariates (see Table 1 for details): 1. Distance to forest edge (Euclidean distance to forest edge based on the forest cover in 2005). 2. Elevation and slope (used at a 90 m resolution). 3. Annual precipitation and distances to large cities, roads, rivers (used at a 500 m resolution). 4. PA shape and area.
We focused exclusively on state governed PAs, managed by MNP, and established in 2005 or earlier, for which PAME data was available (see below). This sample included 74% (35 out of the 47) of all state governed terrestrial PAs (Strict Nature Reserves, National Parks, and Special Reserves) in Madagascar.

| PAME data and recategorization
The PAME methodology used in Madagascar is based on the framework by the International Union for Conservation of Nature (IUCN) for assessing the management of PAs (Hockings, Stolton, Leverington, Dudley, & Courrau, 2006), adapted to the local setting by the National Association for the Management of Protected Areas (ANGAP), now MNP (ANGAP, 2005). PAME evaluations were completed yearly from 2005 to 2010 by ANGAP, and consisted of 31 core questions and for some years additional ones (see Supporting Information Table S1). The scoring is ordinal, ranging from 0 to 3 with the option to gain additional points for a few of the indicators. We categorized these core questions into four broader categories: (a) Design and Planning, (b) Capacity and Resources, (c) Monitoring and Enforcement Systems, and (d) Decision Making Arrangements, following Geldmann et al. (2018). The idea with the categorization was to capture different dimensions of management, instead of calculating a composite score clumping all questions together. The categories were chosen to reflect the common pool resource framwork (Ostrom, 2009) and the IUCN World Commission on Protected Areas management effectiveness framework (Hockings et al., 2006), representing different aspects and successive steps in the management cycle believed to be crucial in achieving effectively managed PAs (see Supporting Information Table S1 for how T A B L E 1 List of datasets used in the analyses, including their resolution and source the PAME questions were divided into these categories). We calculated a standardized management score for each category and each PA following Geldmann et al. (2015). These were further divided into the three levels standardized by Leverington et al. (2010), with 0 to <0.33 corresponding to "inadequate," 0.33 to <0.67 to "basic," and 0.67 to 100 to "sound."

| Forest cover changes inside PAs 2005-2010
The full forest cover data was used to calculate average yearly deforestation rates inside all PAs, for the years 2005-2010, using the equation for deforestation in Tabor, Burgess, Mbilinyi, Kashaigili, and Steininger (2010), following the general approach outlined in Food and Agriculture Organization (1995).

| Matching analysis of avoided deforestation impact
We assessed the performance for each PA individually. For each PA, we randomly sampled 10% of forested pixels in 2005. We did the same for the forest pixels outside PAs for each of the three forest types (Humid, Dry, and Spiny) that correspond to the main ecoregions of Madagascar. For each sampled pixel we obtained information on deforestation between 2005 and 2010 (0/1) and contextual variables (Table 1), using a GIS overlay. Matching was restricted to pixels representing the same forest type as the PA. For PAs at the boundary of ecoregions that included two different forest types (n = 3), we allowed selection from both of those forest types (humid and dry, humid, and spiny). We used Eklund et al's. (2016) counterfactual approach by which we compare each focal pixel (i.e., each sample pixel from inside a PA) to a group of pixels with similar covariate characteristics. Compared to other commonly used matching techniques, this method allows for partitioning the environmental covariate data, so that it first searches for good matches in a smaller multidimensional space, and only expands the search if needed. In this study the total span (i.e., the full extent of the multidimensional space) was first divided by 20, and if enough good matches were not found the span was expanded by dividing by 19, and so on, until the objective of associating each focal point, that is, sample pixel from inside the PA, to the 500 most similar pixels, protected or not. Throughout this paper, these 500 most similar forest pixels are referred to as the "similarity sets." For estimating effectiveness, we considered only similarity sets that included a minimum of 10% of nonprotected pixels to ensure comparing PA pixels to a large enough number of non-PA pixels.
For each similarity set, we compute (a) the fractions of deforested pixels out of the nonprotected pixels and (b) the fraction of deforested pixels out of the protected pixels. Based on this, and due to the zero-inflated nonnormal distributions of the data, we compute the effect size as the frequency of pair comparisons in which the protected pixels avoid deforestation when their matched pairs do not (the probability of superiority for dependent groups: PS dep , Grissom & Kim, 2012). For more detailed information on the method and a discussion of the effect size measures used, see Eklund et al. (2016).

| Additional tests and tools used
We used Spearman rank correlations for looking at relationships between PAME scores and PA impact and deforestation rates. Software and R packages used are specified in the Supporting Information.

| Forest cover changes inside PAs 2005-2010
The yearly average deforestation rates (%) inside PAs varied from 0 to 0.89 (mean: 0.07, SD: 0.20, median: 0.00027). Seventeen PAs out of the 35 showed no deforestation at all inside their borders and three had very high levels of deforestation (above 0.50% per year; Figure 1). For comparison, the overall annual deforestation rate in all of Madagascar (PAs included) in the time period investigated was 0.36% per year.

| Avoided deforestation impact of PAs
Calculating 95% confidence intervals for the PS dep measure showed that 29 out of the 36 PAs had an impact in mitigating deforestation within the PA borders (Figures 1 and 2). The PS dep measure varied from 0.33 to 0.95 (Figures 1 and 2), with 0.5 equal to no difference between inside and matched outside, so that values below this showed induced deforestation and values above 0.5 showed less deforestation than expected by chance. PAs in Madagascar differ substantially in how much pressures they experience, some show high deforestation rates despite being effective compared to the counterfactual, whereas other show low deforestation while still effective (Figure 1).

| PA management effectiveness
The PA management effectiveness scores increased across all categories (i.e., Design and Planning, Capacity and Resources, Monitoring and Enforcement Systems, Decision Making Arrangements) between 2005 and 2010 (see Figure 3).
Management components related to Design and Planning scored the highest medians compared to other categories for each year, whereas Capacity and Resources was the category receiving lowest yearly median scores, and was the only category that did not consistently increase over time, showing a slight decrease for 2009 ( Figure 3).
Although most PAs saw only smaller changes in PAME scores over time, for a few PAs scores changed substantially. Changes have been mostly positive, that is, PA management has improved from 2005 to 2010 for those PAs with data available for the last year (n = 31; Design and Planning: 15 out of 35, with 1 decrease, and 15 remaining stable; Capacity and Resources: 24 out of 35, with 6 decreases, and 1 remaining stable; Monitoring and Enforcement Systems: 22 out of 35, with 3 decreases, and 6 remaining stable; Decision Making Arrangements: 17 out of 35, with 3 decreases, and 11 remaining stable).

| Predictability of PAs' deforestation impacts given PAME scores
We found no association between PAME and avoided deforestation pressure. When impact is correlated against the PAME scores, no association is found (PS dep  Although, when using the thresholds suggested by Leverington et al. (2010); (i.e., "inadequate," "basic," or "sound," see Section 2), some positive trends of management on avoided deforestation were observable (Figure 4), this was based on only one PA with "inadequate" management scores, which prevented any robust test of statistical significance between categories. This PA had a measured negative impact (induced deforestation) and its levels of management were identified as "inadequate" for three of the four PAME categories (Capacity and Resources, Monitoring and Enforcement Systems and Decision Making Arrangements) and "basic" for Design and Planning.
When comparing PAME scores to the actual deforestation rates within each PA, instead of the estimated impact as computed through the matching analysis, we found a significant negative correlation between deforestation rate inside a PA and Decision Making Arrangements (−0.455, p < .05) but not for any other PAME category. This relationship holds also after removing the three outliers with exceptionally high deforestation rates. F I G U R E 1 Bar chart of the protected areas' (PAs) relative impact in avoiding deforestation (PS dep ) and the annual deforestation rate (% per year) between the years 2005 and 2010, for all the PAs included in the study (n = 35). 95% confidence intervals shown for PS dep . A value higher than 0.5 for PS dep shows success compared to environmentally similar unprotected pixels outside a PA all during the 5 year period investigated. When comparing the deforestation in an area to its counterfactual outside, most PAs in Madagascar avoided deforestation. However, these results did not appear to be associated with our measure of management inputs.

| DISCUSSION
We suggest that one contributing explanation for the lack of correlation may be that management levels of the PAs in Madagascar were already at basic to sound levels and located in areas with low rates of forest loss, and therefore this set of PAs provides little variation with which to explore the effect of different levels of management. Only one PA had overall "inadequate" management scores (Figure 4). This PA should be of utmost priority, and it is encouraging to see that the PAME data seem to be able to flag failing PAs that show increased deforestation, that is, higher deforestation rates than what would have been expected given the environmental covariates.
Our results are somewhat in line with previous research: the few studies that have looked at correlations between management inputs and PA impacts in avoiding forest loss have reported no or weak links Nolte & Agrawal, 2013;Schleicher, Peres, & Leader-Williams, 2019). It may be that the observed patterns correspond with reality, and that there is a true disconnect between PA management actions and effectiveness, and that other factors, such as wider governance quality, are more important in determining PA performance (Eklund & Cabeza, 2017;Pyhälä, Eklund, McBride, Rakotoarijaona, & Cabeza, 2019;Schleicher et al., 2019). However, we identify at least three further alternative interpretations that could F I G U R E 2 Map of Madagascar showing the three forest types and the protected areas (PAs) included in this study. The color of the PAs relate to their impact in reducing deforestation pressures, with darker blue PAs having a higher impact, yellow PAs not making any difference, and brown PA showing even more deforestation than expected  help to explain the lack of statistical association between levels of management and avoided deforestation. First, the rapid assessments of management effectiveness as used in Madagascar might not reflect reality. Second, using avoided deforestation as a measure of impact might come with caveats, and third, management and avoided forest loss might not be related the way we think they are. We next explore these alternatives in more detail.
4.1 | Is PAME data not a good indicator of PA management reality?
It is hard to be sure how well our PAME data reflects PA management reality in Madagascar. Previous research has found that managers are well placed to assess key management issues accurately (Cook, Carter, & Hockings, 2014), but the results might not be generalizable to a very different context, such as the realities in many least developed countries. Concerns have thus been raised about how reliable information the PAME evaluations contain, as managers may feel an incentive to overestimate management performance, to ensure continued funding, especially since many of the biggest conservation nongovernmental organizations (NGOs) and donors now have them as a funding requirement Craigie, Barnes, Geldmann, & Woodley, 2015). In the case of Madagascar, the trend of increasing PAME scores over time is in direct contrast to the decrease in government effectiveness around the political coup in 2009 (see Supporting Information Figure S1).
However, while this might suggest a disconnect between PAME scores and PA management realities, the dip in 2009 for Capacity and Resources might be an indication of the harsher funding situation following the coup when many international partners withdrew their nonhumanitarian support (Kull, 2014). Additionally, it has been suggested that the environmental sector in Madagascar was so strong pre-2009 that the PA management side might have been only marginally affected by the political turmoil and absence of governmental leadership (Gardner et al., 2018), even if the threats escalated (Allnutt et al., 2013;Barrett, Brown, Morikawa, Labat, & Yoder, 2010;Innes, 2010).

| Is avoided deforestation a too simplistic impact measure?
A counterfactual measure of success, that is, how much deforestation an area managed to avoid, is relative to the absolute pressures experienced. Therefore, if an area experiences no pressures, the impact will be zero, no matter how good the local management is. Low impact might thus reflect either opportunistic design (i.e., the PA is located in an area where deforestation is unlikely even without protection), or alternatively weak management, unable to mitigate pressures and showing similar levels of deforestation as comparable unprotected land (Box 1). Which of these apply tends to be indiscernible from the results of a typical matching analyses, but can be inferred from comparisons to absolute deforestation rates (impact vs. outcome, following BOX 1 Schematic figure showing how no impact can be the outcome of poor design or weak management, and correspondingly, how the same level of impact can be achieved under very different pressures  Pressey et al., 2017). As the PAs in this analysis showed on average low absolute levels of deforestation this might explain why no significant correlations between management inputs and PA impact were found. Because matching outcomes reflect the relative mitigation, not the absolute, they may not reflect well the level of intervention effort needed, meaning that two PAs might show the same relative impact, even if one avoids a lot of deforestation and the other substantially less (Box 1). This is a key limitation of the use of matching methods in isolation to infer impact, especially if applied more widely to drawing conclusions about the difference PAs have made.

| Dynamic pressures make static comparison between management inputs and PA impacts problematic
Deforestation is a dynamic spatial process, where pressures shift location from easily accessible areas to more remote (Eklund et al., 2016). This means that even if a PA currently has little impact, due to location (see previous section), but sound management is in place, it might start having an impact once the pressures increase. Such patterns have already been described for Madagascar (Eklund et al., 2016), where the PA network in the spiny south turned from having no impact to showing the highest impact of all forest types. An inspection of the changes in the land cover data showed that this was the result of relocated pressures, spreading to previously remote regions once the more accessible land had already lost its forest cover. Such a scenario emphasizes the importance of funding management and carrying out management effectiveness evaluations, irrespective of links to PA impacts right now. If pressures were to increase quickly in the future, impact can be reached faster if preventive management measures are in place. This links to the wider debate of proactive versus reactive approaches to conservation (Brooks et al., 2006;Eklund & Cabeza, 2017), and how different data on PA effectiveness could be used to prioritize actions, which we discuss next.

| Prioritizing action
PAME assessments highlight strengths and shortfalls in current management systems which can help in identifying the actions that are required to improve management and ensure successful outcomes and impact in the future. PAME assessments might be particularly useful for detecting poor performance at the lower end of the spectra. PAs showing both high absolute deforestation rates and high impact-as measured with a counterfactual approach-should be treated as high priority for continued support, they are the areas under most pressure right now, with sound management in place, yet major management efforts are needed due to the magnitude of the pressures. PAs showing high deforestation but low impact are another high priority category. For these, the focus should be on improving management quality, in order to increase impact. Here, it would be crucial to work with both local managers and communities, identifying strengths and weaknesses, and hearing what support would be most needed. The third category of PAs would be those currently under low deforestation pressures, whose effectiveness in reducing deforestation as measured using counterfactual analysis can be the same as some of the PAs under high pressures, but whose impact in terms of the total number of trees per unit area preserved is much lower. Such areas are situated in a setting of lower pressures, but with equal capacity to mitigate them. These areas might not be of immediate concern, but should invest in more monitoring to ensure a continued positive development of management capacity. The fourth category of PAs are those with low impact and low levels of deforestation. These PAs may be a lower priority, but still need to receive capacity building and training events, to establish a basic level of management, with adaptive management plans in place, ready to deal with pressures as they might increase in the future.

| Caveats
In this study, we focused on deforestation as the main threat to the forests in Madagascar. However, using avoided deforestation as a proxy for PA impact has limitations, especially as we only investigated a 5 year period in order to analyze links with the available management data. For example, selective logging of high value timber is likely to have gone undetected, and the same applies to other types of forest degradation. Based on the datasets used, it was also not possible to discern human caused deforestation from natural forest loss, for example caused by extreme weather conditions, such as cyclones. We also did not use any buffers around the PAs, leaving the possibility for potential leakage (Ewers & Rodrigues, 2008), even if previous matching studies have not found evidence that this process would affect PA impact (Andam et al., 2008;Gaveau et al., 2009;Joppa & Pfaff, 2010b). In matching studies in general, it would be important to incorporate more detailed data about the land tenure and governance regimes outside PAs, as it might be that what was now treated as "outside" in fact could have been under some other type of land use regulation, such as some type of community forest management contracts or REDD programs (Reducing Emissions from Deforestation and Forest Degradation in Developing Countries). For a fuller understanding of PA effectiveness, also other proxies are needed, such as fire-frequencies (Nolte & Agrawal, 2013), species declines, or declines in functional groups (Barnes et al., 2016;Craigie et al., 2010;Geldmann et al., 2018;Laurance et al., 2012). We stress that PAs that are successful in preventing forest loss, might still be under threat and experience forest degradation and reductions in faunal populations, ultimately leading to the empty forest syndrome (Wilkie, Bennett, Peres, & Cunningham, 2011). Also, we acknowledge that the PA impacts in this study are only measured through impact on biodiversity, and we were unable to measure social and economic impacts (Andam, Ferraro, Sims, Healy, & Holland, 2010;Brockington & Wilkie, 2015;Naughton-Treves, Holland, & Brandon, 2005). Time lags may be an issue for this study; it seems reasonable to expect some delay between implementation of management actions and observable ecological outcomes. In this study we compared only the management scores for 2005 with the outcomes in deforestation in the coming 5 years, as it seemed reasonable to assume that following years might show a response. However, we can imagine several occasions when a longer time lag might be expected, such as when building trust with local surrounding communities. It is also worth mentioning that some of the PAs included in this study might have been extended during the study period, but we do not have data of when and how this exactly affected the area boundaries (Gardner et al., 2018).
Finally, the PAME data ignores aspects related to the wider quality of governance, such as for example law enforcement and corruption (Eklund & Cabeza, 2017). Previous studies have shown that PA managers often report such issues as being main obstacles for carrying out effective management (Pyhälä et al., 2019;Schleicher et al., 2019), as the mandate of a local manager is quite limited in the face of such powerful drivers.
In conclusion, both continued efforts to carry out quantitative impact evaluations of PA effectiveness, and collection of PAME assessments, are needed as our study shows that they can complement each other in displaying different facets of how PAs perform. Currently, there is a limited pool of studies comparing these two, but our presented framework highlighting the need to account for pressures, allows for making interesting hypotheses that could be tested once more studies become available.