Dynamics of individual adherence to mass drug administration in a conditional probability model

We present a comprehensive framework which describes the systematic (binary) choice of individuals to either take treatment, or not for any reason, over the course of multiple rounds of mass drug administration (MDA) - which we here here refer to as adherence and non-adherence. This methodology can be fitted to (or informed by) program data as well as manipulated to reproduce the same adherence behaviours of past analyses, and can go beyond past analyses to describe new behaviours that have yet to be considered in the literature. Our model also has a straightforward interpretation and implementation in simulations of mass drug trials for disease transmission studies and forecasts for control through MDA. We demonstrate how our analysis may be implemented to statistically infer adherence behaviour from a dataset by applying our approach to the recent adherence data from the TUMIKIA project, a recent trial of deworming strategies in Kenya. We stratify our analysis according to age and sex, though the framework which we introduce here may be readily adapted to accommodate other categories. Our findings include the detection of past behaviour dependent non-adherence in all age groups to varying degrees of severity and particularly strong non-adherent behaviour of men of ages 30+. We then demonstrate the use of our model in stochastic individual-based simulations by running two example forecasts for elimination in TUMIKIA with the learned adherence behaviour implemented. Our results demonstrate the impact and utility of including non-adherence from real world datasets in simulations.

stated goals. Learning the patterns of individual adherence or non-adherence to MDA 27 control measures for NTDs from real world data followed by their implementation in 28 simulated scenarios is a relatively recent development in the study of NTDs. Past 29 analyses assessing individual adherence have informed the approach we take in this 30 work. However, we have sought to provide a framework which encapsulates as many 31 types of adherence behaviour as possible so that their implementation in modern 32 simulations is streamlined effectively. Our example application to the TUMIKIA data 33 highlights the importance of such a general framework as we find past behaviour 34 dependence that may have been missed by other methods. 35 1 Introduction 36 Recent reviews, guidelines and work predicting the outcome of MDA to control the 37 transmission of various NTDs all strongly stress the importance of individual adherence 38 in successfully reaching elimination targets [1][2][3][4][5][6][7][8][9] (see also Ref [10] for a review on 39 patient adherence to HIV medication). Such analyses have taken a variety of approaches 40 in describing the strength in tendency of participants in a given MDA program with 41 multiple rounds to either passively or actively avoid treatment in a potentially repetitive 42 manner. The precise nomenclature for this behaviour is also debated, where terms such 43 as 'compliance', 'adherence' and 'concordance' were all discussed for their relative merits 44 in a recent review [7]. In this work, we shall refer to the binary choice of individuals to 45 either take treatment, or not for any reason, over the course of multiple rounds of MDA 46 as revealing their 'adherence' or 'non-adherence'. Ultimately, the effect that this 47 behaviour has on the success or failure of control through MDA is equally unambiguous. 48 In this paper, we develop a general approach to describe individual adherence or 49 non-adherence to MDA, into which past literature approaches (or future ones) may be 50 incorporated. Our principle intention is to provide a framework within which as many 51 behaviours as possible are captured so that computational modelling approaches are 52 more flexible. To illustrate how our methodology may be implemented and interpreted 53 in practice, we apply it to the TUMIKIA project: a recent cluster randomised, 54 controlled trial in Kwale County, Kenya [11][12][13]. number of diseases. See, e.g., Refs [1,[3][4][5][6][7][8]. In this section, we will lay out a general 60 model for the treatment adherence across multiple rounds in an MDA intervention 61 program. In Appendix S3, we discuss other implementations of adherence in models and 62 how they fit within our general framework.
For example, someone may feel that being treated last year makes it less 73 important to receive treatment in the current round. Alternatively, they may have 74 been put off treatment due to side effects from initially taking the drug (e.g. 75 praziquantel to treat schisosomiasis makes some children suffer from headaches, 76 dizziness, stomach pain, nausea, or tiredness) 77 2. Time dependence: everyone involved in the trial may be subject to global 78 influences that change over time. For example, enthusiasm or funding for the trial 79 may drop as it proceeds, or unforeseen sociological or political events may change 80 people's desire to adhere to the program. This will result in the probability of 81 treatment in a given round for a given individual being explicitly dependent on 82 time and is distinct from dependence on past behaviour, which will also result 83 implicitly in the probability of treatment for an individual changing over time. 84 3. Population-level heterogeneity: the probability of adherence may vary across the 85 population. That is, individuals may have a personal probability of adherence 86 that they retain across multiple rounds of the intervention. In this case, the 87 probability of adherence will have a distribution across the population. Typically, 88 population-level heterogeneity may be strongly correlated with covariates such as 89 sex or age group, in which case it can be represented by a stratification (or 90 'binning') of the population into sub-groups, each with their own adherence 91 probability.

92
In reality, any model of adherence might include one or more of these sources of and independent, respectively).

102
The distinctions above are of critical importance as it is possible, e.g., for a 103 treatment program to suffer severely from past behaviour dependent non-adherence 104 without any apparent heterogeneity in adherence within the population. They also allow 105 us to categorise and clarify models of adherence already described in the literature, 106 which include Refs [1,[4][5][6][7][8].

107
Several models of MDA treatment programs employ an adherence model developed 108 by Plaisier in the context of onchocerciasis (the Plaisier model) [8]. The Plaisier model 109 assigns a probability of adherence to each individual which they then retain for the 110 duration of the MDA program [14,15]. As such, this model would be characterised by 111 us as a heterogeneous population, time-independent model with no explicit individual 112 dependence on past behaviour. For the interested reader, we discuss the relationship of 113 the Plaisier model (and others [5]) to our categorisation of adherence models with more 114 detail in Appendix S3. Before we introduce the notion of dependence on past behaviour in our model, it shall 121 be instructive to describe the situation where it is absent. When the probability of an 122 individual taking treatment is not dependent upon any of their past behaviour, then it 123 is simply given by the coverage c n in each round n of MDA. In the absence of 124 population heterogeneity, this probability would then apply to all individuals within a 125 given cohort -a case which corresponds to either the 6th or 8th row of and FF, where T and F are accepting and avoiding treatment, respectively. Let the 132 probability of accepting treatment in the first round be set to P (T) = α. In round 2, we 133 now fix the conditional probability of getting treated, given treatment in the first round 134 as P (T |T) = β. Let us also set a corresponding conditional probability for not being 135 treated in the second round given that there was no treatment in the first round as 136 to the following recursion relation

Individual
(1) Defining the system state vector as we may rewrite Eq (1) above in the form p n = M p n−1 where we have defined the 143 following transition matrix The eigenvalues and eigenvectors of M are given by where v is normalised to sum to 1. As long as |β + γ| < 1, the system will relax to the 145 v state, which has a unit eigenvalue, giving a long-term probability of treatment of 1 In addition, the relaxation will be oscillatory if λ < 0. In Appendix S1, we demonstrate 147 how to obtain the following solution to Eq (1) Notice that by matching p n to the coverage of treatment in a given population, one may 149 directly compare the impact of adherence models such as Eq (1) to those with past 150 behaviour independent adherence behaviour. Furthermore, by setting λ = 0 in Eq (7) 151 one finds a model for past behaviour independent adherence that is time-independent, 152 i.e., p n = β = 1 − γ.

153
Any sequence of treatments can be seen as a set of alternating adherent and 154 non-adherent runs. A key statistic in the context of preventive chemotherapy is the run 155 length (in rounds) over which an individual complies or fails to adhere. For an adherence 156 run, this is the number of consecutive treatment adherences, given an initial adherence. 157 This can also be thought of as the first passage time to failure. Since the P (T |T) = β is 158 constant, the run length is distributed according to a geometric distribution, with Correspondingly, for a run of failures, Any long run of treatment choices by an individual will breakdown into an alternating 161 sequence of F and T runs. Hence, the probability of a round chosen at random being T, 162 P (T), is matching the conclusions drawn earlier from the eigenvalues and eigenvectors of Eq (5). 164 From Eqs (8) and (9), it is clear that as λ approaches 1, the length of both success and 165 failure runs grows as 1/(1 − λ). In the absence of past behaviour dependence, λ = 0 and 166 the adherent and non-adherent run lengths are given by 1/(1 − q) and 1/q, respectively. 167

Statistical inference from data
The likelihood above is effectively three independent beta distributions, one in each of the parameters, such that the posterior distribution P(θ|D) becomes where we have assumed a flat prior π(θ) ∝ 1 to derive the following Bayesian evidence 187 normalisation and N = N T + N F is defined as the total number of individuals.

189
Note here that Eqs (11) and (13) may be generalised to the case where n rounds of 190 treatment have taken place. We have provided these expressions in Appendix S1.
2.3 Time-dependent adherence and more general behaviour We shall hereafter refer to the above matrices as 'choice matrices'. In the proceeding Following a similar argument to the one used in solving the homogeneous Markov model 206 (which is provided in detail in Appendix S1), we may obtain a solution to Eq (15), 207 which is given by where we have defined an important new quantity 209 ω n ≡ β n n−1 + γ n n−1 − 1 .
Notice, firstly, that when w n = 0 the system reverts to a time-dependent past 210 behaviour independent adherence model, i.e., without past behaviour dependence such 211 that p n = β n n−1 = 1 − γ n n−1 . By analogy with the time-independent Markov model 212 (where ω n = λ in Eq (5)), |w n | = 0 signals the presence of some degree of past 213 behaviour dependent adherence behaviour. In more detail, for successive rounds over 214 which w n > 0, the system will relax towards the steady state and when w n < 0 this will 215 be accompanied by oscillatory behaviour. Note also that w n may act as an indicator for 216 the severity of adherence and non-adherence behaviour in the system -where larger 217 absolute values for w n approaching a maximum of 1 will indicate increasingly past 218 behaviour dependent behaviour. The value of w n is therefore a useful indicator for the type of adherence behaviour in 223 the relatively general description of time-dependent Markov models. We shall use this 224 parameter to illustrate our results from the TUMIKIA project in Sec 3.

225
April 17, 2020 7/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.17.20069476 doi: medRxiv preprint 2.3.3 General choice matrices: non-Markovian models 226 The most general set of causal adherence models described by Eq (14) have choice 227 matrices which take the form where 'non-Markovian' behaviour in the n-th round clearly corresponds to a past 229 behaviour dependence between rounds which exceeds the immediate last round, i.e., Notice that all of the adherence models that we have identified in this work may be 232 categorised by various constraints on the elements of the choice matrices introduced in 233 Eq (14). For completeness and reference, these are   The universality of the choice matrix approach suggest that it is an ideal candidate for parameterisation of the inference problem from data and model comparison. Let the data now correspond to a set of n-vectors D = {X} where each individual's adherence or non-adherence behaviour in the n-th round is recorded, such that X n = T, F. Using Eq (14) the full generalisation of the likelihood (which supports all of the possible adherence models) becomes where 1 A denotes an indicator function which takes value unity when condition A is 254 satisfied, else it vanishes.

255
April 17, 2020 8/29 The large number of available degrees of freedom in Eq (19) motivates a systematic 256 approach to inferring the choice matrix components from a given set of data. We elect 257 to consider models which isolate the many degrees of freedom by constructing scenarios 258 where past behaviour dependent adherence only occurs for a single round and is past 259 behaviour dependent to only one other round -all other degrees of freedom are hence 260 set to those corresponding to time-dependent past behaviour independent adherence, i.e. 261 C T nn = C F nn = c n . The likelihoods and Bayesian evidence normalisations for this more 262 restricted set of models are calculated in Appendix S1. 263

264
The probability of adherence may vary across a population of individuals. The first 265 possible form that this heterogeneity may take can be attributed to age, gender and 266 other social factors. In such cases, stratification of the population into separate cohorts 267 for study is an appropriate tool to quantify this variation. We primarily take this 268 approach to population heterogeneity in Sec 3 and our analysis of data in Appendix S2. 269 The second possible form that population heterogeneity could take may not be 270 immediately attributable to social groupings. In such situations, it is intuitive to 271 consider that adherence probability for an individual is drawn from a distribution which 272 applies to the entire population or cohort of study. This approach is the same as used in 273 other models in the literature (see Appendix S3 for more details). We shall now briefly 274 elaborate on how one might include this form of heterogeneity in the formalism we have 275 introduced in this work through a simple, generic example. We leave further specific 276 applications of this approach to a future publication in progress.

277
To illustrate the generic effect of the population heterogeneity described above on our individual adherence probabilities, let us consider the time-independent Markov model we introduced earlier. The long-term probability of adherence q in Eq (7) may itself be randomly drawn from a population heterogeneity distribution P pop (q) for an individual within the specified cohort of study, such that q ∼ P pop (q). Note also that λ in Eq (7) need not vary between individuals at the same time. Using the results given in Eqs (8) and (9) for the same model one may deduce that the mean adherent and non-adherent run lengths are generically modified by where E pop (·) denotes taking an expectation value with the distribution P pop (q). Hence, 278 depending on the choice for this distribution, one may either shorten or lengthen the 279 mean run lengths across the population accordingly. Note that due to the fact that q is 280 a probability, a natural candidate for P pop (q) is the beta distribution. pre-school-aged children (pre-SAC, ages 0-4), school-aged children (SAC, ages 5-14) and 286 other adult age categories. Importantly, these age categories were assigned at the 287 beginning of the trial and hence -particularly in the case of pre-SAC -the effect of 288 'ageing-out' of each category must be considered on the overall adherence behaviour.

289
April 17, 2020 9/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . Table 2. A measure of how past behaviour dependent the adherent and non-adherent behaviour of individuals is in the n-th round of treatment, ω n ≡ β n n−1 + γ n n−1 − 1, which was introduced in Eq (17). This value is given for each age group and sex inferred from the TUMIKIA project dataset and is computed using the maximum likelihood values for the conditional probabilities.
Age group accepting treatment in a given round is denoted by a 'T', whereas not accepting 291 treatment in a given round is denoted by an 'F'.

292
It is important to comment here on the validity of interpretating the inferences made 293 as directly due to individual behaviour patterns using the TUMIKIA project adherence 294 data [13]. An important caveat to this interpretation is that, for various reasons, some 295 individuals were not offered treatment and were hence automatically accounted for as 296 'non-adherent' within the data. The impact of these individuals to the success of the 297 MDA program is the same as if they had directly refused treatment, and hence the 298 practical use of inferring this pattern of adherence for simulation forecasts of MDA 299 outcome is still clear. Despite this fact, however, we cannot fairly discriminate this 300 behaviour pattern from simply not being offered treatment in the present data.

301
Using the same age categories as before and the likelihoods for the adherence models 302 which have been derived in Appendix S1, in Appendix S2 we demonstrate the 303 applicability of our model for adherence to statistical inference by performing a 304 thorough analysis of the TUMIKIA project dataset. We note that such an analysis has 305 already been performed in Ref [13], hence this analysis does not consistute the novelty 306 of the work presented here but instead is indended to illustrate the application of our 307 mathematical model. We have a detailed description of these findings in Appendix S2 308 but we provide a short written summary of our general conclusions below.

309
In Table 2 we have provided the ω n values, calculated using Eq (17), for each age 310 group and sex inferred from the TUMIKIA project dataset. This value was shown in 311 Sec 2 to be an indicator of how past behaviour dependent the adherent and 312 non-adherent behaviour of individuals is as a response to MDA treatment. We can see 313 quite clearly from Table 2 that a degree of past behaviour dependent non-adherence is 314 indeed present in all the age groups, with the exception of the final round ω 4 values for 315 those in the pre-SAC (which have mostly aged into SAC by this point) and SAC 316 categories -which is to be expected due to the nature of school-based MDA, and is an 317 effect which is found and explained in more detail by Ref [13]. Table 2 also shows that 318 the most past behaviour dependent non-adherent age group and sex appears to be 319 males aged 30+.

320
In addition to these results, Eq (1) appears to provide a good descriptive model for 321 many of the past behaviour dependent non-adherent age groups and sexes, but this 322 model must be extended to an equivalent time-dependent one -see Eq (15) -in order 323 to describe other cases. In this section we illustrate the impact of adherence, as described by our mathematical 326 model, on the predictions made by simulations for the outcome of MDA on the chances 327 Table 3. The positive predictive value (PPV) for elimination evaluated by fully age-structured stochastic individual-based simulations of hookworm (with adult worm and eggs/larvae mortality rates set to µ 1 = 0.5 and µ 2 = 26.0 per year, respectively and the density dependent fecundity factor is set to γ = 0.01, as considered in Ref [16]) with two different clustered community types specified by the TUMIKIA transmission parameters inferred from the baseline epidemiological data in Ref [16]. The parameters quoted are the endemic prevalence P , parasite aggregation parameter k, basic reproduction number R 0 and cluster population number N , where the age profiles are all assumed to be exactly flat for simplicity. The PPVs are evaluated after 100 years post-cessation of MDA and are quoted assuming either past behaviour independent adherence (i.e., simple time-dependent coverage in age groups) or the adherence behaviour inferred from our model in this paper for the TUMIKIA project (see Appendix S2).
Cluster type (see Ref [16]) PPV (Past behaviour independent adherence) PPV (TUMIKIA adherence) (P, k, R 0 , N ) = (0. 15 communities that were treated in the TUMIKIA project [16]. The resulting effect that 331 the known TUMIKIA adherence has on the positive predictive value (PPV) for 332 elimination of hookworm in these two cluster is given in Table 3, where an equivalent 333 PPV assuming past behaviour independent adherence is also provided for direct 334 comparison in each case.

335
From Table 3 it is immediately clear that although there is relatively high coverage 336 of MDA in the TUMIKIA project [11,12], the presence of observed past behaviour 337 dependent non-adherence has an important effect on the PPVs for elimination, shifting 338 the chances of hookworm elimination in both clusters lower by 43% and 23%, 339 respectively, when compared to the standard forecasts which assume past behaviour 340 independent adherence. Despite the causes for non-adherent behaviour being varied, the net effect on the 343 outcome of MDA interventions aiming to eliminate NTDs is much the same, and hence 344 the predicted outcomes from simulation studies should also reflect this. It is for this 345 reason that in this paper we have been able to develop a simple but comprehensive 346 framework which describes the systematic binary choice of individuals to either take 347 treatment, or not for any reason, over the course of multiple rounds of mass drug 348 administration (MDA) -which we have referred to as 'adherence' and 'non-adherence', 349 respectively.

350
In Sec 2 we introduced our models for adherence which can account for new 351 behaviours that have yet to be considered in the literature. We have also demonstrated 352 that they can also reproduce the same adherence behaviours of past analyses as well in 353 Appendix S3. Our analysis further yielded an interesting parameter, ω n , given in 354 Eq (17), which can be used as a guide to indicate the strength of adherent or 355 non-adherent behaviour in any given setting.

356
In order to illustrate our framework in the context of statistical inference from a real 357 adherence dataset, we applied our probability model to the recently collected adherence 358 data from the TUMIKIA project in Kenya in Sec 3. Findings from this dataset extend 359 and support the analysis of recent work [13], which include past behaviour independent 360 adherence/non-adherence for school-aged children (SAC) and the detection of past 361 behaviour dependent non-adherence to treatment in nearly all other age groups and 362 both sexes. A full description of our results and analysis is given in Appendix S2. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04. 17.20069476 doi: medRxiv preprint In Sec 3 we also commented on the validity of interpretating the inferences made as 364 directly due to individual behaviour patterns using the TUMIKIA project adherence 365 data. As we pointed out, for various reasons, some individuals were not offered 366 treatment and were hence automatically accounted for as 'non-adherent' within the 367 data. However, the impact of these individuals to the success of the MDA program is 368 the same as if they had directly refused treatment and hence the practical use of 369 inferring this pattern of adherence (using the formalism which we have outlined in this 370 work) for simulation forecasts of MDA outcome is still clear. We shall leave the more also be of great value. In the coming few years more data on adherence patterns will 397 emerge from detailed research studies of MDA impact to add to the information 398 provided by the Tumikia study [11]. These include the ongoing DW3 trial studies in 399 India, Benin and Malawi for the control of STH [17] and the Geshiyaro study in 400 Ethiopia for the control of STH and schistosome infections by MDA [18]. Summary. In this supplementary information we derive many of the key mathematical expressions which are used and referred to in the main text.

Time-independent Markov model
Assuming that the conditional probabilities β and γ are constant, the time-independent Markov model may be mapped to the following recursion relation As in the main text, defining the system state vector as we may rewrite Eq (22) above in the form p n = M p n−1 where we have defined the following transition matrix The eigenvalues and eigenvectors of M are given by where v is normalised to sum to 1. Given that |λ| < 1 in all realistic circumstances, it is clear from this description that v represents the equilibrium of the system over multiple rounds with λ defining the rate of relaxation towards it. When λ = 0, the model becomes a history-independent model in which the next round is dictated solely by its probability at that round. In order to study the dynamics in more detail, we apply the following transformation to the relation given by Eq (22), such that Through explicit summation, Eq (28) is solved bỹ By reapplying the inverse transformationp n → p n to Eq (29) and identifying p 1 = p 1 = α, we obtain the following solution to Eq (22) April 17, 2020 14/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Equivalently, satisfying the dual to Eq (22) in terms of the probability of non-treatment in the n-th round 1 − p n , solutions to Eq (30) must also satisfy In Fig 2 we illustrate the dynamics of the system using Eq (30) with range of parameter values chosen for γ. Notice, in particular, that the system exhibits oscillation before relaxing to a steady state when γ is chosen such that the eigenvalue λ = β + γ − 1 < 0. For another way of calculating the expected lengths of repeat adherence E(n T ) or non-adherence E(n F ) of an individual (as computed in the main text), given that they begin with the same choice in the first round, one need only fix (α = β, γ = 1) or (α = 1 − γ, β = 1) and take moments with Eq (30), respectively, such that

Time-dependent Markov model
Consider the choice matrices with elements C T nn and C F nn corresponding to the conditional probabilities of treatment and non-treatment in round n given treatment April 17, 2020 15/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04. 17.20069476 doi: medRxiv preprint and non-treatment in round n , respectively, such that When the only nonzero elements of the choice matrices in Eq (34) are along the their lower diagonals, i.e., such that only C T n n−1 = β n n−1 = 0 and C F n n−1 = 1 − γ n n−1 = 0, the system is described by a time-dependent Markov process with recursion relation Following a similar argument to the one used in solving the homogeneous Markov case, we may obtain an implicit solution to Eq (35). Using the transformation p n →p n = p n n n =2 (β n n −1 + γ n n −1 − 1) , we once again substitute into the relation given by Eq (35), yielding where Eq (37) is solved by the explicit summatioñ Using the corresponding inverse transformation to Eq (37) we hence obtain a solution to Eq (35), which is given by

Likelihoods and Bayesian evidence
Let the data now correspond to a set of n-vectors D = {X} where each individual's adherence or non-adherence behaviour in the n-th round is recorded, such that X n = T, F. Using Eq (34) the full generalisation of the likelihood (which supports all of the possible adherence models, becomes where 1 A denotes an indicator function which takes value unity when condition A is satisfied, else it vanishes. The large number of available degrees of freedom in Eq (40) motivates a systematic approach to inferring the choice matrix components from a given set of data. We elect to consider models which isolate the many degrees of freedom by constructing scenarios where past behaviour dependent adherence only occurs for a single round and is temporally dependent on only one other round -all other degrees of freedom are hence set to those corresponding to time-dependent past behaviour independent adherence, i.e. C T nn = C F nn = c n . The likelihood for this more restricted set of models -which we denote as L nn (D|θ), where nn corresponds to the pair of rounds chosen to be April 17, 2020 16/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04. 17.20069476 doi: medRxiv preprint dependent on each other in time -may be obtained by rewriting Eq (40) in the following form where the data D = {N X } has now been compressed into the set of numbers of people who track the same behaviour as X, i.e., for 3 rounds, this forms the set of the following numbers of people: N TTT , N TTF , N TFT , etc. The Bayesian evidence integral corresponding to Eq (41) with a choice of flat prior π(θ) ∝ 1 is therefore Some non-Markovian past dependence may be captured by the likelihood defined in Eq (41), however their Bayesian evidence may need to be compared with equivalent Markovian models which also generate decaying long-term correlations of a particular form. Using the same formalism as Eq (41), the time-dependent Markov model has the following likelihood and, hence, yields the following Bayesian evidence April 17, 2020 17/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04. 17.20069476 doi: medRxiv preprint and the Bayesian evidence of the same model

Introduction
In Figs 3, 4, 5, 6 and 7 we plot the maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities given treatment (filled points) or non-treatment (hollow points) in a previous round of the overall, male and female participants in the top, middle and bottom rows, respectively. In the left column the constant conditional probabilities between any given sequential pair of rounds have been inferred, which corresponds to the time-independent Markov model of the main text and Appendix S1. In the right column all possible round pair dependencies are considered (indicated by the arrows on the horizontal axis), where in each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from past behaviour independent adherence. In all plots, above each pair of components we have also provided the log-Bayes factors [19], defined by where the evidence for each pair E nn has been evaluated using the relations provided in Appendix S1 and the reference model evidence E ref has been set to that of time-dependent past behaviour independent adherence for all components.

Results
In Figs 3, 4 and 5 we present our results for the pre-SAC, SAC and 15-29 age groups of individuals in the TUMIKIA project. These age groups appear to be well-described by a time-dependent Markov model so past behaviour dependent non-adherence is clearly present. This may be identified by the largest log-Bayes factor values being given in the red-coloured right column plots for all three sets of plots. However, the conditional probabilities in all groups appear to drift closer together by round 4 of treatment, which signals a gradual transition from past behaviour dependent to independent adherence. In Figs 6 and 7 we present our results for the 30-49 and 50+ age groups of individuals in the TUMIKIA project. The overall cohort, as well as the males and females in both age groups, appear to exhibit strong evidence of past behaviour dependent non-adherence -in particular, they are all apparently well-described by a time-independent Markov model. These conclusions may be drawn both by the consistent distance between all of the values for the inferred conditional probabilities with the red points of the right column of plots, as well as the largest evidence (as measured by the log-Bayes factor in the top row of the plots) for a difference in conditional probabilities in the left column in both plots.
In all of the cohorts studied in Figs 3, 4, 5, 6 and 7, we report no evidence for the existence of dependencies between rounds that depart from a Markovian description (as can be inferred from the comparatively small log-Bayes factors for the blue and green conditional probabilities in the right column of all plots). This is an interesting, and perhaps surprising, result regarding the nature of human behaviour in response to mass drug administration.
April 17, 2020 19/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  The maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities of accepting treatment for any given pair of sequential rounds (these are hence homogeneous in time and the process is Markovian) given treatment (filled points) or non-treatment (hollow points) in a previous round. Right column: The same as the left column but with allowed time dependence in the conditional probabilities of accepting treatment in each respective round (highlighted in orange on the horizontal axes). In each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from time-dependent past behaviour independent adherence and hence the likelihood is given in Appendix S1. Different colours for each point correspond to different lengths in time for the dependencies in behaviour. The datasets used are from the standard pre-SAC (0-4) age category from the TUMIKIA project where the: top row corresponds to the overall group; middle row corresponds to the male sub-group; and bottom row corresponds to the female sub-group.
April 17, 2020 20/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  The maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities of accepting treatment for any given pair of sequential rounds (these are hence homogeneous in time and the process is Markovian) given treatment (filled points) or non-treatment (hollow points) in a previous round. Right column: The same as the left column but with allowed time-dependent in the conditional probabilities of accepting treatment in each respective round (highlighted in orange on the horizontal axes). In each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from time-dependent past behaviour independent adherence and hence the likelihood is given in Appendix S1. Different colours for each point correspond to different lengths in time for the dependencies in behaviour. The datasets used are from the standard SAC (4-15) age category from the TUMIKIA project where the: top row corresponds to the overall group; middle row corresponds to the male sub-group; and bottom row corresponds to the female sub-group.
April 17, 2020 21/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities of accepting treatment for any given pair of sequential rounds (these are hence homogeneous in time and the process is Markovian) given treatment (filled points) or non-treatment (hollow points) in a previous round. Right column: The same as the left column but with allowed time-dependent in the conditional probabilities of accepting treatment in each respective round (highlighted in orange on the horizontal axes). In each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from time-dependent past behaviour independent adherence and hence the likelihood is given in Appendix S1. Different colours for each point correspond to different lengths in time for the dependencies in behaviour. The datasets used are from the 15-29 age category from the TUMIKIA project where the: top row corresponds to the overall group; middle row corresponds to the male sub-group; and bottom row corresponds to the female sub-group.  Conditional Probability The maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities of accepting treatment for any given pair of sequential rounds (these are hence homogeneous in time and the process is Markovian) given treatment (filled points) or non-treatment (hollow points) in a previous round. Right column: The same as the left column but with allowed time dependence in the conditional probabilities of accepting treatment in each respective round (highlighted in orange on the horizontal axes). In each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from time-dependent past behaviour independent adherence and hence the likelihood is given in Appendix S1. Different colours for each point correspond to different lengths in time for the dependencies in behaviour. The datasets used are from the 30-49 age category from the TUMIKIA project where the: top row corresponds to the overall group; middle row corresponds to the male sub-group; and bottom row corresponds to the female sub-group.
April 17, 2020 23/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Conditional Probability The maximum likelihood as well as the limits of the marginalised 95% credible region for the conditional probabilities of accepting treatment for any given pair of sequential rounds (these are hence homogeneous in time and the process is Markovian) given treatment (filled points) or non-treatment (hollow points) in a previous round. Right column: The same as the left column but with allowed time dependence in the conditional probabilities of accepting treatment in each respective round (highlighted in orange on the horizontal axes). In each case the components corresponding to a given round were measured assuming all other respective rounds were inferred to be from time-dependent past behaviour independent adherence and hence the likelihood is given in Appendix S1. Different colours for each point correspond to different lengths in time for the dependencies in behaviour. The datasets used are from the 50+ age category from the TUMIKIA project where the: top row corresponds to the overall group; middle row corresponds to the male sub-group; and bottom row corresponds to the female sub-group.
April 17, 2020 24/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Summary. In this supplementary information, we analyse some of the existing models of adherence from the literature in the context of our proposed framework.

The Plaisier model
Several models of MDA treatment programs employ an adherence model developed by Plaisier in the context of onchocerciasis control [8,20]. The Plaisier model assigns a probability of adherence to each individual which they then retain for the duration of the MDA program [14,15]. As such, this model would be characterised by us as a heterogeneous population, time-independent model with no explicit individual dependence on past behaviour. The individual probability of adherence is given by U (1−c)/c , where U is a uniform random number and c is expected probability of treatment and hence the expected coverage. The model is therefore completely parameterized by the overall expected coverage. The PDF for the adherence probability for this process is given by The PDF of p rises monotonically from zero to one for all values of c > 0.5 and falls monotonically for c < 0.5 (for c = 0.5, it is flat). Note that π(p) is a beta distribution: π(p) = Beta[p; c/(1 − c), 1]. For this distribution, the mean failure run length is hence given by E(n F ) = c 2c − 1 .
Note that in this model, adherence failure run length becomes undefined at a coverage of 50% or less. Additionally, one can show that the variance of this random variable becomes undefined for values of coverage below 66%, suggesting that failure run lengths in finite populations drawn from this distribution will exhibit extreme variability. The probability of an individual being untreated across N rounds of MDA in this model can also be calculated, giving where B(·, ·) is the beta function. Fig 8 shows the distribution of adherence probabilities for 2 different coverage values and also the probability of an individual not adhering with treatment across a 4-round MDA program.

The Griffin Model
The adherence model used by Irvine et al [5] to model MDA adherence in the treatment of lymphatic filariasis was originally created by Griffin et al in the context of intevention strategies against malaria transmission [21]. The original Griffin model is quite broad and deals with multiple simultaneous interventions and the correlations in their uptake. It does not include conditional dependencies for an individual's behaviour and is therefore a heterogeneous population, time-independent, individually past behaviour independent model in its simplest form. Each individual in the population is assigned a April 17, 2020 25/29 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. correlation parameter, u i , drawn from a normal distribution with mean u 0 and variance σ 2 . These parameters are retained throughout the MDA program. At each round a MDA round, each individual draws a unit-variance normal deviate with mean u i , z. Treatment is accepted if z < 0. The expected coverage is given by φ(−u 0 / √ 1 + σ 2 ), where φ is the standard normal cumulative probability function. This leaves one free parameter to control the distribution of adherence probabilities across the population.
The cumulative distribution of adherence probability, p, is given by π(p) = φ[φ −1 (p; 0, 1) + u 0 ; 0, giving a PDF (53) The function φ −1 (p; 0, 1) varies monotonically in the range (−∞, ∞) with p. In Eq (53), the parameter σ = 1 acts to discriminate between two functional forms. For σ < 1, the distribution has a 'normal' shape with a single local maximum, while for σ > 1, the distribution has asymptotes with local maxima at the p = 0 and/or 1. In this, it is very similar, qualitatively, to the beta distribution (see Fig 9). Adherence probability distributions with A) σ = 1.2 and B) σ = 0.5 for mean coverages of 25% and 75%. The probability distribution for adherence for coverages of 25% and 75%.