Consequences of survey method for estimating hunters' harvest rates

Harvest data are widely used to understand hunting in tropical forests. However, survey methods are susceptible to biases which could affect results. We compare catch data from two approaches applied concurrently in the same villages (n = 7) in Gola Forest, Liberia: hunter recall interviews (n = 208 hunters, 253 trips) and continuous monitoring by village‐based assistants (n = 53 hunters, 404 trips). We use Bayesian multi‐level models to: (a) compare estimates of animals killed per trip for each data source; (b) test whether differences between villages are consistent across data sources and (c) identify potential sources of bias. Hunter recall produced higher, and more variable, catch estimates than village‐based monitoring, with mean of 7.3 animals [6.0–8.8 95%CI] compared to 3.0 [2.4–3.6], for a trip lasting 3.2 days (the average duration from village‐based monitoring). Mean catch‐per‐village from village‐based monitoring failed to predict hunter recall catch and villages with highest catch differed between methods. Differences in trip duration were a potential source of bias: hunter recall recorded longer, more variable, trips (mean 4.0 ± SD 3.0 days, range = 1–32) than village‐based monitoring (mean 3.2 ± SD 1.7, range = 1–10). Longer trips were associated with higher catch‐per‐day, use of guns, forest camps and accompaniment by another person; so nonrandom sampling of these traits may have introduced bias. Between‐hunter variability was lower with village‐based monitoring, suggesting sampling captured a less diverse subgroup of hunters, or that recall data were noisier due to reporting errors. Our results demonstrate that methodological biases can have large effects on catch estimates and should be carefully considered when designing or interpreting hunting studies.


| INTRODUCTION
Over-harvesting of tropical forest wildlife for consumption presents a major challenge for conservation (Benítez-López et al., 2017) and could impact livelihoods and food security of many people (Cawthorn & Hoffman, 2015). Datasets describing what hunters catch are useful for understanding this issue as they give insight into patterns of wildlife abundance (Weinbaum, Brashares, Golden, & Getz, 2013) as well as resource use (Grande-Vega, Carpinetti, Duarte, & Fa, 2013). Where hunting is openly practiced, catch data can be relatively easy to obtain, and there is a wealth of literature describing hunting statistics from across the tropics, dating back to the 1960s (Asibey, 1966;Taylor et al., 2015). However, catch and hunting effort can be measured in numerous ways, and methods are prone to measurement error or sampling bias from various sources (e.g., Jones, Andriamarovololona, Hockley, Gibbons, & Milner-Gulland, 2008;Rist, Rowcliffe, Cowlishaw, & Milner-Gulland, 2008). A clearer understanding of bias associated with catch data could help accurate interpretation of results and improve survey designs.
There are several mechanisms by which different methods to quantify harvest rates may incur bias. Variation in the way hunters are recruited, where data are recorded, and by whom, could influence data quality (Weinreb, 2006). Added to this, measurement error or nonrandom sampling might occur at the level of villages, hunters, hunting trips or prey species (Hill & Kintigh, 2009;Jones et al., 2008). Finally, the different measures of hunting effort used to interpret catch data introduce additional error (Rist et al., 2008). In general, the validity of catch per unit effort as an indicator of prey populations can be limited, since biologically-relevant hunting effort is hard to define (Rist, 2007), and relationships between catch, effort and prey abundance are rarely known (Maunder et al., 2006). Nonrandom sampling can introduce biases in hunting systems with high variability between hunters (e.g., Coad, 2007;Fa et al., 2016;Kümpel et al., 2009). For example, hunters who rarely associate with settlements or camps are likely to be under-represented in many studies (e.g., McEvoy et al., 2019). Similarly, methods that rely on self-reporting may be susceptible to error if, say, social desirability bias leads to under-or over-reporting of effort, catch, or particular species, as respondents seek to present themselves more favorably (Tourangeau & Yan, 2007). Recall error can also affect data quality, and, as with social desirability bias, can vary due to details of survey design (Golden, Wrangham, & Brashares, 2013;Jenkins et al., 2011;Jones et al., 2008).
Previous comparisons of methods show that catch estimates can be sensitive to various aspects of survey design (Jones et al., 2008;Noss, 1998). This includes how effort is defined and measured (Rist et al., 2008) and how harvest is assessed, for example, as dayweighted or hunter-weighted return rates (Hill & Kintigh, 2009). Detection of trends may depend on whether sampling strategies maximize number of hunters or hunting trips (Rist et al., 2010). Harvest rates have been shown to differ substantially depending on whether estimates are extrapolated from hunter follows, self-reporting or consumption diaries Noss, 1998). Management decisions based on harvest data could be affected by biases incurred during data collection. For instance, decisions about how to tailor conservation messages, or where to allocate resources, depend on accurately identifying the types of hunters or areas with the highest conservation impacts . Results derived from a skewed sample of hunters or villages may not give a robust picture of who or where to target. Nevertheless, minimizing potential bias through survey design is often difficult in practice, and its potential extent and implications for findings can rarely be quantified.
Thus, there are numerous avenues and mechanisms by which bias can be introduced to catch data. A better understanding of the likelihood, nature and extent of these biases will result in more accurate assessments of potential error and allow more realistic levels of uncertainty to be incorporated into management and policy recommendations, whilst at the same time helping to improve study design. This study addresses gaps in our understanding of the extent to which survey methods might produce different estimates of harvest rates. We explore the hypothesis that sampling biases and measurement errors differ according to data collection methods, producing results which are inconsistent between methods. To assess possible pathways for bias, we examine relationships between catch, hunting effort and behavioral characteristics of hunters which may be nonrandomly sampled. We explore how uneven sampling of longer or shorter hunting trips might introduce biases and consider two predicted pathways by which trip duration sampling could become skewed. The first predicts that continuous recording of hunting activity will sample a higher proportion of shorter hunting trips compared to "snapshot" surveys. The second predicts that post-trip resting periods are longer following longer or more successful hunting trips, such that surveys in which hunters are opportunistically encountered during resting periods in villages might sample a greater proportion of long, successful trips. We evaluate inconsistency between survey methods by quantifying the extent to which results from one method predict those of another and explore potential consequences for management decisions. Specifically, we consider whether results from two methods differ in terms of which villages appear to have the highest harvest rates, representing information which might be used to prioritize conservation efforts.

| METHODS
We examine bias in hunting surveys by contrasting two methods which illustrate common sampling strategies and constraints (Table 1). For each method, we drew on previous findings (e.g., Noss, 1998;Rist et al., 2008) and our own familiarity with the hunting system at our study site, to identify (a) possible sources of bias, and (b) survey design features likely to affect the nature and extent of these biases. For the first method, "villagebased monitoring," a local assistant was recruited in each village to record information about catch of participating hunters each time they returned from a trip. Local assistants used datasheets and did not share identifying information about hunters with researchers. The second method, "hunter recall," was a questionnairebased survey conducted by research technicians working for a conservation organization, in which hunters were asked to recall recent harvests. These methods use different sampling approaches: for village-based monitoring, data are a continuous record of hunting activity at specific locations (villages), over a period of time and many hunting trips are recorded from relatively few hunters. For hunter recall, data are a set of discrete hunting trips and many hunters are sampled, but number of trips per hunter is small. The two methods also illustrate survey designs that would be appropriate under different site-specific constraints, in terms of survey cost limitations, and the degree to which participants can be expected to openly share information (Gavin, Solomon, & Blank, 2010). Village-based T A B L E 1 Possible pathways for introduction of bias in the two survey methods for obtaining catch data, identified based on familiarity with the study site. Examples from the literature are given in which the studies' authors note that similar biases could have affected data quality Village-based monitoring:

Pathways for introduction of bias
• Fewer long trips than short trips • Over-represents trips conducted close to villages  Hunter recall interviews: • Over-represents trips followed by long rest periods (e.g., high catch, long travel distances) • Over-represents memorable hunting trips (e.g., high catch, rarely killed species)

Catch
Factors affecting recorded catch • Animals sold or eaten in the forest only recorded if reported by the hunter • Social desirability bias: Hunter may conceal catch from local Factors affecting recorded catch • Large, unusual species more accurately recalled than small, frequently killed species Parry et al., 2009) Village-based monitoring: • Catch under-estimated if many animals are sold/consumed elsewhere (Kümpel et al., 2010) • Over-represents species brought to villages monitoring represents a low-cost survey relative to hunter recall, as data collection can be carried out by (financially compensated) local members of the hunting community thus minimizing time and transportation costs relative to deploying full-time research technicians. Village-based monitoring is also more appropriate where hunting is somewhat sensitive, as unlike hunter recall, participants do not share information directly with external researchers. We examine consequences of these differences for estimating hunters' catch, defined as the number of animals killed by a hunter on a trip, including mammals and birds, and any animals sold or eaten in the forest (see Jones, Papworth, et al., 2019 for a list of species). We explore evidence for specific sources of bias by assessing covariates of hunting catch, trip duration and inter-trip resting period, to assess how nonrandom sampling of these variables or their correlates might skew results. We evaluate the degree to which results from the first method predict those of the second and explore implications of survey differences for informing management decisions. Specifically, we consider how results differ for understanding which villages have highest harvest rates and for predicting harvest rates of unobserved hunters or villages. Work took place in the Gola Forest, Liberia, at the GolaMA conservation project site (details in . Hunter recall data were collected from all 18 villages within the study area, between July 2016 and July 2017. Villagebased monitoring data were collected at seven of the villages in the study area, which were a nonrandom subset of villages where hunter recall surveys were administered, between September 2016 and March 2017. Analyses in which we compare results from both methods therefore utilize only the subset of hunter recall data collected from the seven villages in which villagebased monitoring was applied. However, we do not exclude hunter recall observations (n = 20) made in April to August in which village-based monitoring data were not collected (see Figure S1). This was done in order to maximize sample sizes for estimating hunter-and villagelevel variability.

| Hunter recall interviews
A questionnaire was administered by trained research technicians via face-to-face interviews to all identified hunters from the 18 villages in the study area. Research technicians were GolaMA employees who visited villages for short periods (1-5 days), to conduct the survey. Hunters were identified through key informants, a previous household survey, and snowball sampling. If hunters were not initially available, researchers returned a minimum of three times before ruling out participation. Hunters were asked general questions about hunting practices and to provide details of their most recent hunting trip, including species killed and sale or consumption of carcasses. Hunters re-encountered on subsequent visits to villages were asked to repeat the questionnaire (n = 48), so each hunter provided details of up to three hunting trips. Time between repeat interviews ranged from 55 to 278 days (median = 149). Parts of this dataset, and information about hunters' livelihoods have been published in  and .

Pathways for introduction of bias Sampling units Village-based monitoring
Hunter recall interviews Potential consequences assistants (e.g., due to local taboos, laws, or to keep income from high value species private) or exaggerate catch sold or eaten in the forest.
• Lower reporting accuracy of large carcass counts, for example, values given to the nearest factor or 5 or 10 (Vaske, Beaman, & Beaman, 2006), or shrunk to the mean (Jones et al., 2008) • Lower recall accuracy of events further in the past • Social desirability bias: hunters may under-or over-report particular species to give favorable impression, for example, to conceal species killed illegally or to appear more skilled (Duda et al., 2017;Kümpel et al., 2010;Wright & Priston, 2010) Hunter recall interviews: • Lower accuracy associated with long trips, frequently killed species Both methods • Depending on direction of social desirability bias catch, or particular species, may be under-or overestimated. Effects could vary across hunters and villages 2.2 | Village-based monitoring Local assistants were recruited in a subset of seven villages in the study site. Assistants were village residents and self-declared hunters with basic literacy who were identified by research technicians after consultation with chief hunters. Villages were selected based on availability of a suitable local assistant. Assistants were responsible for recruiting hunters to participate in the study and recording catch over continuous monitoring periods of 1-3 months. Whenever a participating hunter returned to the village, local assistants recorded hunting trip duration in days and the number and species of animals killed, based on direct observation and the hunters' own reports of animals sold or eaten in the forest. Research technicians visited villages every 4-8 weeks to collect completed datasheets. Assistants coded hunters' identities on datasheets so that research technicians were unable to identify participating hunters. Participants were informed that their identity would not be revealed to research technicians or project staff.

| Ethics
Free, prior and informed consent was given verbally by all respondents who were informed that the study sought to understand hunting, answers would be confidential, and results would be published in reports and academic publications. Participants were informed that their names would not be linked to information they provided in any publication. Specific permission to conduct the study was obtained from local authorities and traditional leaders in each village, and village-based monitors were fairly compensated for their time. Ethical approval was obtained from Royal Holloway University of London ethics committee.

| Analytical framework
We used Bayesian multi-level models to estimate catch using a Poisson likelihood with log link function. Varying intercepts were included for hunters and villages. Weakly informative priors were specified as follows: general intercept = Normal (0,5), fixed variable coefficients = Normal (0,0.5), standard deviations of varying intercepts = Exponential (2). These reflected the prior belief that effect sizes were unlikely to exceed 1 in this setting. Continuous fixed variables were scaled by subtracting the sample mean and dividing by the sample standard deviation. All models were created with the Stan computational framework (http://mc-stan.org/) accessed using R (R Core Development Team, 2015) with package "brms" (Bürkner, 2017). Models were compared using pareto smoothed importance sampling or K-fold cross validation (K = 10 folds) if the pareto shape parameter exceeded 0.7 for many observations (Vehtari, Gelman, & Gabry, 2015), using package "loo" (Vehtari et al., 2019). Sampling was run for at least 4,000 post-warmup iterations, convergence was assessed based on Rhat values <1.01. Credible intervals were calculated as highest posterior density intervals.

| Predictors of catch
For hunter recall data, we modeled the number of animals killed on a trip (catch) as a response variable with predictors for trip duration (days), hunting method (snare, gun or both), season (early dry season, late dry season or rainy season), whether the hunter described themselves as being based in the town or at a forest camp and whether the hunter was accompanied by anyone else on the trip (e.g., another hunter or helper). The interaction between season and hunting method was included based on hunters' reports that dry leaf litter in late dry season made gun-hunting harder, whereas trappers were reportedly more successful as animals were predictably distributed near water sources. We compared models with all possible combinations of predictors, with trip duration and varying intercepts in all models (Table S4).

| Predictors of trip duration and post-trip rest period
For hunter recall data, trip duration (days) was modeled as a zero-truncated Poisson response with predictors for hunting method, season, hunting base and trip accompaniment. Models were compared for all combinations of predictors with varying intercepts for villages and hunters. We tested whether post-trip resting period (days) was predicted by duration of previous trip, from hunter recall and village-based monitoring, since association between trip duration and resting period could be a cause of bias if sampling of trips is nonrandom with respect to resting period. For hunter recall, resting period was taken as number of days since a hunters' return from their previous trip plus the days they expected to rest until their next trip. For village-based monitoring, resting period was the days following a hunters' return from a trip until their departure on the next trip, as recorded by the local assistant (Supporting Information). For hunter recall we used zero-truncated Poisson likelihood, and for village-based monitoring, a zero-truncated Negative Binomial likelihood was used to improve fit. Previous trip duration (days) was included as a fixed effect, with varying intercepts for villages and hunters.

| Comparing catch estimates
We compared catch from village-based monitoring and hunter recall data collected from the same villages. We fit separate models to each dataset, using identical error structures and priors to make results as comparable as possible. Posterior parameter distributions were compared to assess differences in (a) estimated mean catch for a trip of given length; (b) estimated variation across hunters and villages; and (c) estimates of catch at the same specific villages.
To assess whether village-level patterns were consistent across data sources, we modeled average catch per day for villages, from village-based monitoring data, as a predictor of catch from hunter recall data. This villagecatch variable was calculated from the raw village-based monitoring data as total catch divided by total trip-days, for each village. Village-catch was added as a predictor of hunter recall catch, in Poisson models with a covariate for trip duration and varying intercepts for villages and hunters. Additional predictors were added, in all combinations, for hunting method, season, hunting base and trip accompaniment. If village-based monitoring data predicted hunter recall observations perfectly, the slope and intercept parameters for the village-catch term would be 1 and 0, respectively. Deviation from these values was taken as an estimate of relative bias between data sources.

| Comparing predicted catch
Catch was simulated for a "new" hunter and village by drawing samples from the posterior distribution of each model, sampling parameter values of the hunter-level, village-level and population-level intercepts, then simulating catch from a Poisson distribution. The posterior distribution summarizes the probability that any given set of parameter values would produce the observed data, given the prior information and generative model. Sampling from the posterior generates parameter values at a frequency proportional to their expected probability, given the models' assumptions.
Values were simulated from 10,000 draws each from village-based monitoring and hunter recall models. Simulated values represented catch from a "new" hunter and village, for a trip of average duration and simulations were repeated for the village-based monitoring trip duration mean (3.2 days) and hunter recall mean (4.2 days).

| RESULTS
Village-based monitoring recorded shorter trips on average than hunter recall; the longest village-based monitoring trip was 10 days compared to 32 days in hunter recall (Table 2).
Mean prey size per trip was similar for both methods (Table 2). Excluding trips longer than 10 days (n = 5) from the hunter recall dataset did not alter these general patterns, giving mean trip duration of 3.7 days (SD 2.1, n = 247).

| Predictors of catch
The best supported model of hunter recall catch included hunting method, trip accompaniment, season and method-season interaction (Figure 1; Table S4). Trips in which hunters were accompanied (by another hunter or helper) had higher catch (estimate = 1.35, [1.16,1.57 95%CI]). Hunters using only snares or guns had lower catch than those using both methods (relative to using both, gun-use only = 0.81[0.64,1.04], snare-use only = 0.79[0.61,1.03]). There was some evidence that trips in the late dry season had lower catch than in early dry or rainy seasons and that snare-only hunters experienced relatively low catch in the rainy season, whereas this was not the case for hunters using guns (Figure 1).

| Predictors of trip duration and post-trip rest period
The best supported model of trip duration included hunting method, trip accompaniment and hunting base. A model with similar support also included season (Table S5). Longer trips were associated with use of guns or both guns and snares (estimate of snare-use relative to both =0.63[0.51,0.78]; Figure S3 There was limited support from village-based monitoring data that post-trip resting period increased with previous trip duration (estimate = 1.07[0.99,1.17], probability that estimated effect is >0 = 0.95; Table S6, Figure S4), but not from hunter recall data (estimate = 1. 06[0.92,1.16]; probability that estimated effect is >0 = 0.41).  Table S7). Hunter recall produced higher and more variable, catch estimates than village-based monitoring (Figure 2). Estimated hunter-level variability was higher with hunter recall (SD estimate = 1.8[1.6,2.1]) than village-based monitoring (1.1[1.0,1.2]), whereas village-level variability was similar across data sources (Table S7). The above patterns held when three observations for trips over 10 days (i.e., the maximum observed in village-based monitoring data) were excluded from the hunter recall data (Supporting Information). Village-level variability of T A B L E 2 Sample sizes and attributes of catch data collected using two methods

| Comparing predicted catch from different survey methods
New observations of animals killed on a hunting trip (catch), predicted from village-based monitoring had median of 3 animals [IQR 2-4] for a trip of 3.2 days (mean trip duration from village-based monitoring; Figure 3) and 4[2-5] for a trip of 4.2 days (mean trip duration from hunter recall). Predicted catch from hunter recall data was approximately twice as high, at 7[4-12] and 8[5-13] respectively for 3.2-day and 4.2-day trips. The village-based monitoring model predicted catch at least as large as the hunter recall median (8) in only 5% of simulated observations. The hunter recall model predicted catch equal or lower than the village-based monitoring median (3) in 11% of simulated observations.

| DISCUSSION
Differences between data collection methods could introduce biases that compromise the quality of catch datasets due to measurement error and nonrandom sampling.
Our study is one of few to quantify the potential scale of these differences and highlights the extent to which outcomes can be sensitive to survey design. We found that estimated catch per day had a twofold difference depending on the source of data used, and that trip duration and hunter variability also differed. The hunter recall method, where a large number of hunters provide information about relatively few trips, produced higher estimates of catch and hunter variability than village-based monitoring, where relatively few F I G U R E 2 Predicted mean catch for an average hunter for each village, taken from models of village-based monitoring data (triangles) and hunter recall data (circles). Colors indicate ranks from highest mean catch (red) to lowest (blue), assigned to villages according to each data source. Points are mean predicted values, lines indicate 67%, 87%, and 97% CI hunters contributed information about many trips. The magnitude of the differences suggests studies aiming to describe harvest patterns could reach different conclusions due to bias introduced during data collection: for example, apparent sustainability of hunting levels at a site may be affected by survey methods. The methods we compared could incur bias from several sources (Table 1), which are difficult to differentiate, and may vary substantially between sites or over time. Potentially important sources of bias can be considered in three categories: nonrandom sampling of hunters, reporting errors from self-reported information and representation of long versus short hunting trips. First, we consider bias that may result from the way hunters are sampled. Lower variation in reported catch from village-based monitoring relative to hunter recall suggests the former sampled a more homogenous subset of hunters and villages. This may be because villagebased monitoring, or indeed any method where a local assistant recruits participants, may favor sampling of indigenous residents, who are relatively settled or socially integrated and who may not represent the wider hunting community. Hunter information was not recorded with village-based monitoring, so hunter profiles cannot be directly compared between data sources. However, previous work at the site has shown that hunters can be grouped based on livelihood strategies, demography and hunting behavior . Citizenship was an important feature defining group membershipwith groups that had low harvest per day and low hunting effort being composed largely of indigenous citizens . Thus, village-based monitoring may disproportionately sample from such groups and fail to capture the full spectrum of hunter types. This problem could be exacerbated if hunting by nonlocal immigrants is a contentious societal issue, as "outsiders" who are active, commercial, hunters may be reluctant to be scrutinized by a local data collector. Such social dynamics may be common to hunting systems elsewhere (e.g., Fa et al., 2016;Gill et al., 2012) and we suggest that attention to social context could improve study designs Jost Robinson, Daspit, & Remis, 2011) and help ensure sampling adequately represents the range of hunters' sociodemographic and behavioral profiles.
Reporting error is a second likely source of bias in hunting studies that will vary with survey design (Jones et al., 2008). Whereas under village-based monitoring, catch and trip duration was observed directly by local assistants, hunter recall relied on information reported by hunters, making it potentially more susceptible to factors such as inaccurate recall , variation in how questions are interpreted (Schwarz & Oyserman, 2001), or deliberate misreporting (Tourangeau & Yan, 2007). These sources of error are challenging to address and could have added noise which increased the variability of hunter recall observations relative to village-based monitoring. Careful pilot testing can help minimizemisinterpretation of questions, but even this is hard to eliminate entirely. For instance, short, unsuccessful trips may be considered irrelevant by some hunters when asked about their "most recent hunting trip." Trips involving multiple hunters may produce ambiguity in which catch to report, for instance if snares set by one hunter were later checked by someone else. The pattern that accompaniment was associated with higher reported catch could have arisen if hunters reported combined catch, for example. In our study, it is possible that hunters were accompanied by other hunters who were themselves study participants. While in our case, the timing and durations of reported trips gave no indication that identical hunting trips were reported by different hunters, the issue is worth highlighting as doublereporting might exacerbate sampling biases or compromise the validity of statistical analyses. Our findings raise the question of whether data sources relying on reported information have consistently higher variability than those based on direct observations. Such a pattern could have consequences for design of monitoring programs, as methods that generate noisier data can be less efficient for detecting trends (Rist et al., 2010).
A third source of bias relates to the relative contribution of short and long hunting trips to datasets. Trip duration is associated with other measures of effort, such as distance travelled or number of snares deployed (Kümpel et al., 2009;Parry et al., 2009;Rist et al., 2008), and different data collection methods may inherently generate samples that weight trips differently. For instance, methods in which each hunter reports only their most recent trip, such as hunter recall, will likely record a lower proportion of shorter trips than continuous monitoring approaches, in which multiple short trips by the same hunter are all recorded. Extremely long trips, from which hunters return less frequently, will be relatively rare in continuous monitoring data but may be more readily captured by a "snapshot" sampling approach like that of the hunter recall method. In addition, if long trips are followed by longer rest periods in villages, any given hunter may more likely be encountered in a village following a long trip than a short one. We found limited support for these predictions; trips recorded by village-based monitoring were shorter and less variable on average, with 90% of observations for trips of up to 5 days, compared to 78% in hunter recall. Trips over 10 days were not observed with village-based monitoring yet represented 2% of hunter recall observations. However, only village-based monitoring data suggested there was a positive relationship between resting time and previous trip duration. If such patterns are consistent, there may be a predictable skew in trip lengths linked to snapshot sampling methods versus continuous recording approaches.
Where long and short trips are nonrandomly sampled, variables that correlate with trip duration will also be skewed, potentially adding to bias. We found that factors associated with higher catch (trip accompaniment, and use of both guns and snares) were also associated with longer trips. While effect sizes were relatively small and there is considerable overlap in predicted catch for trips with different attributes (Figure 1), the observed patterns had reasonable statistical support (Supporting Information) and are reflected in similar findings elsewhere (Coad, 2007;Kümpel, 2006). Trip accompaniment and hunting method variables can be considered as components of overall hunting effort: a parameter that is notoriously challenging to quantify (Rist et al., 2008) but which is important to minimize potential bias from uneven sampling of trip lengths. In our study, a more comprehensive definition of effort, for instance, accounting for number of snares or time spent actively hunting, might have improved agreement between the two methods. More generally, a clear understanding of the relationships between trip duration, effort and catch at any given site could help clarify how representative a sampling approach is likely to be.
The nature and extent of bias incurred by different survey methods may have implications for management decisions. Defining a "high" or "low" hunting offtake is important to differentiate hunter types and identify potential target groups (Dobson, Milner-Gulland, Ingram, & Keane, 2019), which could lead to the development of more effective behavior change interventions . In our study, a new observation of 2.2 prey items per day would be considered high under village-based monitoring but typical according to hunter recall data. If resources are allocated according to level of harvest across villages, the fact that different methods might give different answers is problematic. Furthermore, biases are compounded wherever results are extrapolated to larger scales. For instance, extrapolations based on 100 hunting days/year would give 140-250 animals/hunter from hunter recall, compared with 75-120 from village-based monitoring. Study design is inevitably a trade-off between data quality and survey costs. For example, village-based monitoring was relatively low-cost but provided little detail about hunting trips and sampling of hunters was nonrandom. Such severe sampling constraints may be uncommon, but most hunting studies face data quality constraints to a lesser or greater degree. The discrepancy between survey methods found in this study and others (e.g., Golden et al., 2013;Noss, 1998) suggests hunting statistics should be interpreted cautiously.
Our findings demonstrate that those planning or interpreting hunting surveys should carefully consider where bias could occur. In particular, how well a given sampling approach is likely to represent the full range of hunters' behavioral profiles, the weighting given to trips of different lengths and what types of reporting error may occur. The specific aims and budget of a survey will dictate which methods are most appropriate for any given situation. However, some problems identified in our study could be minimized by application of rigorous sampling strategies and carefully designed survey instruments. For instance, randomized or stratified sampling techniques could help give a more balanced representation of different types of hunters or hunting trip durations, although such sampling strategies typically depend on being able to identify hunters in the first instance which is often not viable where hunting is prohibited. Additionally, development of survey instruments that address reporting errors (Schwarz & Oyserman, 2001) or social desirability bias (Nuno & St. John, 2015) could reduce these issues. Application of more than one survey method can also help to counter issues of data quality by offering a means to compare results and triangulate findings from different data sources (Keane, Jones, & Milner-Gulland, 2011).
Harvest datasets are a valuable, versatile resource for understanding hunting systems. However, studies likely encompass a range of data quality and results can be skewed by nonrandom sampling or measurement error from multiple sources. Added to this, patterns of bias are unlikely to remain consistent through time due to shifts in hunting practices and socio-political landscapes (Coad et al., 2013;Duda et al., 2017;Gill et al., 2012). Of the potential sources of bias identified for our study, only one, preferential recall of larger species, seemed unlikely from the data. Future work to disentangle the impacts of different mechanisms could provide valuable insight that might help ensure appropriate levels of uncertainty are incorporated into management decisions. Through a better understanding of the accuracy of harvest data, conservationists will be better placed to address the problem of over-hunting as a global driver of biodiversity loss.

ACKNOWLEDGMENTS
We thank GolaMA project staff, the Society for the Conservation of Nature in Liberia, the Forestry Development Authority of Liberia, the clan authorities and participating communities. This work was supported by the Royal Society for the Protection of Birds, through the project "Securing Liberia's forest connectivity through community forest management and innovative financing mechanisms," funded by the European Union, and by Royal Holloway, University of London. We are very grateful to two anonymous reviewers for their comments on the manuscript. FSJ was funded by the European Research Council under the European Union's H2020/ERC grant agreement no. 755965 (ConHuB).