Dissociable effects of APOE ε4 and β-amyloid pathology on visual working memory

Although APOE ε4 carriers are at substantially higher risk of developing Alzheimer’s disease than noncarriers1, controversial evidence suggests that APOE ε4 might confer some advantages, explaining the survival of this gene (antagonistic pleiotropy)2,3. In a population-based cohort born in one week in 1946 (assessed aged 69–71 years), we assessed differential effects of APOE ε4 and β-amyloid pathology (quantified using 18F-Florbetapir-PET) on visual working memory (object–location binding). In 398 cognitively normal participants, APOE ε4 and β-amyloid had opposing effects on object identification, predicting better and poorer recall, respectively. ε4 carriers also recalled locations more precisely, with a greater advantage at higher β-amyloid burden. These results provide evidence of superior visual working memory in ε4 carriers, showing that some benefits of this genotype are demonstrable in older age, even in the preclinical stages of Alzheimer’s disease. In a sample of about 400 cognitively normal older adults, APOE ε4 genotype and β-amyloid pathology predicted better and poorer visual working memory, respectively, suggesting some benefits of ε4 in older age, even in the preclinical stages of Alzheimer’s disease.

tion are weaker 18 . This hypothesis may explain why APOE ε4 (the ancestral allele) persists in human populations rather than being replaced by the ε3 and ε2 alleles, which evolved later 2,19,20 ; however, evidence for its putative cognitive benefits remains mixed and controversial 3,13,21 .
One cognitive measure where ε4 carriers have shown superior performance is the 'What was where?' visual working memory test 22 : ε4 carriers have been reported to recall object locations more accurately than noncarriers after delays of a few seconds [23][24][25] . These studies did not, however, evaluate the possible influence of preclinical Alzheimer's disease pathology. One notable feature of this task is its analog measure of location memory (in contrast to traditional 'correct or incorrect' measures), which allows fine-grained assessment of the precision or quality of memory representations 26,27 .
We aimed to assess the relative influence of APOE ε4 and β-amyloid pathology on the 'What was where?' task in a large population-based sample of adults from Insight 46, a substudy of the MRC National Survey of Health and Development (the UK 1946 Birth Cohort) 28 -the world's longest continuously running birth cohort 28,29 . Participants were aged ~70 years-an age when rates of dementia are low but a substantial proportion (~15-25%) have biomarker evidence of preclinical Alzheimer's disease 30,31 . Based on the literature, we hypothesized that APOE ε4 would be associated with slightly more accurate recall of object locations but that β-amyloid pathology would be associated with subtly poorer performance across the task. Because these two predicted effects are in opposition for ε4 carriers with elevated amyloid burden, we aimed to explore interactions between APOE ε4 and β-amyloid on visual working memory. We also aimed to test whether this task is sensitive to differences in hippocampal volume and white matter hyperintensity volume (WMHV, a marker of cerebral small vessel disease that is common in older people and is associated with cognitive decline, particularly in executive function 32 ).

Dissociable effects of APOE ε4 and β-amyloid pathology on visual working memory
Letters Nature agiNg neuroimaging and neuropsychological assessment including the 'What was where?' visual working memory task (Methods; Fig.  1b,c). A total of 486 participants completed the task: 398 were cognitively normal, with complete biomarker data (Fig. 1a), of whom 120 (30%) were APOE ε4 carriers. Participant characteristics are provided in Supplementary Table 1, along with descriptive statistics for the primary outcome measures. Performance on established tests of memory is presented in Supplementary Information. The prevalence of amyloid positivity among ε4 carriers and non-ε4 carriers was 37.5 and 9.7%, respectively (χ 2 = 43.7, P < 0.0001), consistent with the existing literature 30 .
In cognitively normal participants with complete biomarker data, multivariable regression models were fitted (Methods) to investigate associations between task performance and APOE ε4 (carrier or noncarrier), amyloid status (positive or negative), hippocampal volume and WMHV. The models also included the task condition factors of memory load (low or high) and delay interval (short or long), as well as adjusting for head size and demographic and lifecourse factors, previously shown to predict cognitive performance throughout adulthood in this cohort (Methods). Where betweenindividual factors were significantly associated with performance, we tested for interactions with delay to investigate whether group differences were due to better retention over time. We also tested for interactions between APOE ε4 and amyloid status, to investigate whether the effects of APOE ε4 differed between amyloid-positive and -negative groups.
Results of the regression models are given in Table 1  (see Supplementary Information for results relating to demographic  and life-course factors). As expected, identification and localization memory were poorer under the high-load than low-load condition, and localization was also poorer after long compared to short delay (Table 1). However, in contrast to previous studies 22,23,33 , delay had no statistically significant effect on the proportion of identification or misbinding errors (Table 1).
The association between APOE ε4 and identification was consistent across long and short delays (OR for interaction between APOE ε4 and delay = 0.97 (0.77-1.22), P = 0.79), as was that between Completed 'What was where?' task (n = 486) Complete biomarker data (n = 432) Dementia, n = 2; Parkinson's disease or other neurodegenerative disorder, n = 4; major psychiatric disorder, n = 2; major depression, n = 2; epilepsy, n = 4; traumatic brain injury or major neurosurgery, n = 2; multiple sclerosis, n = 3; cortical stroke, n = 17; brain malignancy, n = 1; mild cognitive impairment, n = 9** Incomplete biomarker data (n = 54) No scan due to claustrophobia (n = 25) No scan due to PET/MRI incompatibility issues (n = 4) No scan due to recent illness (n = 1) No scan due to withdrawal from study (n = 1) Failed acquisition of PET data (n = 6) MR images failed quality control (n = 3) Fluid-attenuated inversion recovery (FLAIR) images failed quality control (n = 2) White matter segmentation failure* (n = 10) No APOE data available (n = 2) b c or a  42 for further details on the definitions of neurological and major psychiatric disorders. *In most cases this was due to erroneous segmentation of vascular abnormalities such as stroke or demyelination. **The numbers in this box add up to 46 because some participants had more than one condition. b, Presentation of the 'What was where?' task. c, Illustration of outcome measures.
Green circles indicate the original location of the target object, red circles indicate the original locations of nontarget objects and blue lines indicate measured localization error. Left, object identification; the participant is required to select the object that they remember seeing. Middle, localization error is measured from the location reported by the participant to that of the closest object; if the reported location is within 140 pixels of that of a nontarget object, this is considered a misbinding error. Right, gross localization error is measured from the location reported by the participant to the original location of the target object. Credit: b,c, are reprinted from Liang et al. 42  When assessed as a continuous variable (standardized uptake value ratio (SUVR)), the association between amyloid burden and identification error was not statistically significant although it was in the expected direction (OR = 1.05 (95% CI 0.96-1.15) per 0.1 SUVR increment, P = 0.26). Hippocampal volume and WMHV did not show statistically significant associations with identification (Table 1).
Localization. APOE ε4 carriers performed better with respect to spatial memory, on average positioning objects 7% closer to the true location than non-ε4 carriers (P = 0.007) ( Table 1 and Fig. 2b). In Supplementary Information it is shown that ε4 carriers had a lower mean localization error in 19 out of 24 trials.
There was no evidence of statistically significant difference in localization error between the amyloid groups (Table 1 and Fig. 2b), nor of an association between continuous amyloid burden and localization error (coefficient = 1.00 (0.97-1.04) per 0.1 SUVR increment, P = 0.90). Regarding an interaction between amyloid status and APOE ε4, the localization memory advantage associated with APOE ε4 was greater among amyloid-positive than amyloid-negative participants, but this interaction was not statistically *Significant at P < 0.05; **significant at P < 0.01. Because these results were obtained from multivariable regression models (Methods), each association is independent of all others. No adjustments were made for multiple comparisons. In addition to the predictors listed, models also adjusted for sex, age at assessment, childhood cognitive ability, education, socioeconomic position and total intracranial volume. For details on the associations of these demographic and life-course factors with visual working memory performance, see Supplementary Information and Supplementary Table 4. a Because localization error data were log-transformed, the coefficients are quoted in exponentiated form for ease of interpretation; for example, a coefficient of 1.10 would mean that the predictor was associated with 10% greater localization error while a coefficient of 0.90 would mean that the predictor was associated with 10% smaller localization error. b OR > 1 indicates that the predictor is associated with a higher proportion of misbinds. c OR > 1 indicates that the predictor is associated with a higher proportion of guesses. CI, confidence interval; N/A, not applicable. The main effects of APOE ε4 on localization error. Markers show adjusted means from the multivariable regression models, adjusted for delay (long versus short), load (low versus high), sex, age at assessment, childhood cognitive ability, education, socioeconomic position, white matter hyperintensity volume, hippocampal volume and total intracranial volume. Error bars show 95% CI. Values plotted are essentially unchanged even if the model does not include adjustment for white matter hyperintensity volume, hippocampal volume and total intracranial volume. b, Data were log-transformed for analysis, but the means and CI presented here have been back-transformed for ease of interpretation. For numbers of participants in each group, see Supplementary Table 1.
The beneficial effect of APOE ε4 on localization was consistent across both long and short delays (interaction coefficient = 1.03 (0.97-1.09), P = 0.37). Additional analyses confirmed that this effect was seen even when considering trials on which the incorrect object was selected (Supplementary Information).
Despite finding sex differences in localization memory (Supplementary Information), and a previous study reporting an interaction between sex and APOE ε4 on localization memory 23 , we found no evidence for an interaction between sex and APOE ε4 (interaction coefficient = 0.99 (0.89-1.09), P = 0.45).
Hippocampal volume and WMHV did not show statistically significant associations with localization (Table 1).
Two-dimensional mixture model for sources of localization error. To clarify and extend the results reported above, we used a two-dimensional (2D)-mixture model approach that isolates the contributions of three sources of localization error: misbinding, guessing and imprecision 25,34 (Methods). Its main advantage is the ability to account for random guesses, which can have a potentially large effect on the traditional localization error and misbinding measures. Results for the imprecision parameter in agreement with the traditional localization error metric are as follows: • ε4 carriers performed better than noncarriers, with significantly lower imprecision (adjusted mean (95% CI): ε4 carriers, 99 pixels (95-103); non-ε4 carriers, 105 (102-108)) ( Table 1). • The reduced imprecision associated with APOE ε4 was greater among amyloid-positive than amyloid-negative participants, although this was not statistically significant (interaction coefficient = −11 pixels (−23 to +22), P = 0.090). • Imprecision was significantly worse for long delay (adjusted mean (95% CI): long delay, 110 pixels (107-113); short delay, 97 (95-100)) ( Table 1).
This confirms that differences in localization error cannot be explained by random guessing, but do indeed reflect differences in the precision of location memory.

Discussion
This study shows that superior performance on a computerized visual working memory task is detectable in APOE ε4 carriers at age ~70 years, even in the presence of β-amyloid pathology indicative of preclinical Alzheimer's disease. ε4 carriers had better recall for object identity and recalled locations more precisely, while β-amyloid pathology was independently associated with poorer recall for object identities; our analyses suggest that there is an interaction between APOE ε4 and β-amyloid burden for localization. The results support the hypothesis of antagonistic pleiotropy, but also highlight the possibility that the beneficial effects of APOE ε4 on specific aspects of cognition may persist into older age 3 . To what extent such a cognitive advantage may explain the survival of the ε4 allele in human populations is an intriguing question.
The superior performance of APOE ε4 carriers did not significantly differ according to the length of the delay between encoding and recall (1 or 4 s). This guides us away from attributing the effect to better retention of memory representations over time, instead pointing towards differences in attention and precision of encoding 25 . This is also supported by the patterns of performance we observed on other memory tests in this cohort, with an advantage for ε4 carriers on a verbal memory test with strong attentional and working memory demands, but not on memory tests requiring learning and retention of material over multiple trials (Supplementary Information). This interpretation is consistent with one mechanism proposed for ε4-associated cognitive advantage, which is that ε4 carriers show increased task-related activation in the frontal and parietal regions and correspondingly better performance on tasks requiring attention, short-term memory and top-down cognitive control; such effects have been observed across the life-course 3,11,12,15,16,23,24,35 , including in children with Down's syndrome 17 . Therefore, our results may be explained by the attention and frontal/executive demands of this task (with the localization measure being particularly sensitive due to its continuous nature), rather than by visuospatial or memory aspects alone. At older ages, increased frontal activation in ε4 carriers has been proposed to reflect compensatory recruitment since frontal regions are relatively spared from Alzheimer's-disease-related neurodegeneration 3,11,36 . Our finding that the localization advantage for APOE ε4 carriers appears relatively greater as amyloid burden increased is consistent with this hypothesis of compensation.
There is currently no consensus on ε4-associated cognitive benefits, and to which domains and functions they may apply 2,3,13,21 . Associations between APOE ε4 and poorer cognition are generally

Letters
Nature agiNg not observed until past middle age [37][38][39] , when they are presumed to reflect the emergence of preclinical pathologies. Additive detrimental effects of APOE ε4 and β-amyloid have been observed in older age, with accelerated cognitive decline and disease progression in ε4 carriers 1,40,41 , possibly due to additional pathological effects of APOE ε4 (for example, on synaptic loss, neuroinflammation and cerebrovascular disease 1,4 ). The complex interplay of APOE ε4 with different pathologies in the aging brain remains difficult to unravel; our analysis accounted for white matter hyperintensity volume and hippocampal volume (neither of which were associated with task performance), providing evidence that the effects of APOE ε4 and β-amyloid on visual working memory are independent of these factors. One limitation of this study is the relatively small number of trials (24) compared to 100-trial versions used previously 23,24,33,42 , although one previous study concluded that a shorter version would be sufficient because group differences were more apparent towards the beginning of the task 42 . Performance on primary outcomes was broadly similar to the longer version, with no floor or ceiling effects, and our results suggest that this short version (duration ~8 min, practical for inclusion in a busy assessment schedule) is sufficient for detection of subtle differences between individuals. Having said that, there are no published data on the test-retest reliability of measures extracted from this task, so future studies examining this point are warranted. One particular strength of our analysis is the availability of life-course data in this cohort, which allowed us to control for factors such as childhood cognitive ability, thus reducing unexplained variability in working memory performance (higher childhood cognitive ability was associated with better recall for object identities and object-location binding (Supplementary Information).
Strengths and limitations relating to the representativeness of Insight 46 participants have previously been discussed 43,44 , the main limitations being that all participants were white and the sample was inevitably biased towards those who were willing and able to travel to the research center 45 , which may have resulted in under-representation of individuals with cognitive decline or neuropsychiatric symptoms that can be present in preclinical Alzheimer's disease. As previously reported, participants tended to be more highly educated and in better health than their peers not recruited to this substudy, and participants who completed the brain scan were less likely to be obese and to have mental health problems than those with missing neuroimaging data 45 . Because education, obesity and depression are associated with increased dementia risk 46 , this raises the possibility that individuals (including APOE ε4 carriers) destined to develop Alzheimer's disease or other forms of late-life cognitive impairment may be under-represented in our analyses. The very small number of ε4 homozygotes (consistent with population prevalence) precluded investigation of the dose-dependent effects of APOE ε4 (see Supplementary Information for descriptive statistics). Because we conducted a large number of statistical tests with multiple outcome measures, these results require verification in replication studies. Future data collections will include measures of tau pathology, enhancing our ability to draw conclusions about relationships between preclinical Alzheimer's disease and alterations in visual working memory.
In summary, we provide evidence of superior visual working memory in APOE ε4 carriers at age ~70 years, even in the presence of subtle cognitive deficits associated with preclinical Alzheimer's disease. This is consistent with the antagonistic pleiotropy hypothesis and suggests that the beneficial effects of APOE ε4 on specific cognitive functions may persist into older age.

Methods
Participants in the Insight 46 study-a substudy of the MRC National Survey of Health and Development (NSHD, the British 1946 Birth Cohort) 28 -were assessed at University College London between May 2015 and January 2018. Recruitment procedures, assessment protocols and recruitment flow-charts have been published 28,43,45,47 . In brief, assessments included neuropsychological tests, clinical examination, combined MRI/β-amyloid PET neuroimaging and other biomarker and genetic measures. All assessments were typically completed on one day, although 62 participants had to have their scan rescheduled for a later date (median interval 49 days). The neuropsychological battery comprised standard paper-andpencil tests and more novel computerized tasks 28,43,44,48 , none of which had been administered previously within the NSHD. The study was approved by the Queen Square Research Ethics Committee -London (REC reference no. 14/LO/1173). All participants provided written informed consent.
Stimuli and procedure. The stimuli and procedure of the 'What was where?' task have previously been described in detail 22,33,42,49 . This type of working memory recall precision task has shown convergent validity with traditional measures of working memory span in older adults, and greater sensitivity than these traditional measures to subtle changes in working memory in patients with Parkinson's disease 27 . The participant was seated in front of a 23' DELL Optiplex 9030 all-inone touch-screen computer. The dimensions of the screen were 1,920 × 1,080 pixels and the approximate distance from the subject's eyes to the center of the screen was 58 cm.
The procedure for the 'What was where?' task is presented in Fig. 1b. In each trial, one or three objects were displayed on the screen at random locations, presented on a black background. Participants were asked to look at the objects and to try to remember their identities and locations. The maximum height and width of the objects was 120 pixels (see Supplementary Information for images of stimuli).
One-object trials are referred to as 'low load' and three-object trials as 'high load' , displayed for 1 and 3 s, respectively, to allow time for encoding. This was followed by a blank screen for either a short or long delay (1 or 4 s), and then a test array appeared in which two objects were displayed along the vertical meridian. One of these objects had appeared in the memory array on the previous screen (the target) while the other was a foil/distractor. Participants were instructed to touch the object that they remembered seeing and drag it to the location where they think it was originally presented (Fig. 1b). There was no time limit for reporting the location-the tester pressed the space bar to initiate the next trial when the participant was ready.
Previous studies using the 'What was where?' task have administered at least 100 trials [22][23][24]33,42,49 , but for Insight 46 a shortened version was used containing 24 trials: four low-load and 20 high-load (two low-load with short delay, two lowload with long delay, ten high-load with short delay and ten high-load with long delay). The experiment was preceded by four practice trials-one of each of the load-delay combinations-and the tester ensured that the participant understood the task before continuing.
All objects, including the foils, were drawn from a pool of 60 fractals that were used across the experiment (rendered using http://sprott.physics.wisc. edu/fractals.htm).
The locations of the objects were generated in a pseudorandomized manner by a MATLAB script (MathWorks, Inc.) imposing the following restrictions necessary to allow analysis of localization error, a key outcome of this task: (1) objects were always at least 280 pixels away from each other, to avoid crowding and to ensure that there was a clear zone of 140 pixels around each object (necessary for the calculation of misbinding errors (below)); and (2) objects were at least 200 pixels from the center of the screen and 120 pixels from the edges. The 24 trials were identical for all participants (that is, the same objects were presented in the same locations) but were presented in a random order so that load and delay conditions were interspersed throughout. Using a random order avoids confounding of the results by either practice effects (familiarity with the procedure could result in enhanced performance throughout the task) or interference effects (because objects appear more than once during the task, the foil in the test array could be recognized from a previous trial, which could increase the likelihood of errors in object identification throughout the task).
Outcome variables. Primary outcomes. Primary outcomes are illustrated in Fig. 1c. For each trial, an object identification error was recorded if the participant selected the incorrect object from the two-choice array.
Memory for object location was defined in terms of localization error-the distance between the location reported by the participant and the closest of the three original locations from the memory array. This definition takes account of the fact that, in high-load trials, participants may mislocalize the target to the location of a different (unprobed) object from the memory array (that is, they make a misbinding error-see definition below). Previous studies have also calculated gross localization error, which is the distance between the location reported by the participant and the true location of the target in the original memory array. In the case of a misbinding error, the gross localization error could be very large so it is a less pure measure of localization precision. We calculated gross localization error for comparison with previous studies, but it was not used as an outcome measure for statistical analyses.
A misbinding error occurs when a participant correctly identifies the target object but swaps its location with the location of another object. If the target is Letters Nature agiNg positioned within 140 pixels of the location of a different object from the memory array, this is counted as a misbinding error. This threshold was used to ensure that a location could not be attributed to more than one object, because objects were always at least 280 pixels apart. Note that under the low-load condition it is not possible to make a misbinding error because there is only one object in the memory array.
As in previous papers, localization and misbinding errors were analyzed only for trials in which the correct object was identified from the two-choice array 22,33,42,49 . 2D-mixture model outcomes. We additionally analyzed performance on the 'What was where?' task using a 2D-mixture model approach that isolates the contributions of three sources of localization error: misbinding, guessing and imprecision 25,34 . In contrast to the traditional localization metric, which considers only the magnitude of localization errors, the 2D-mixture model approach considers two dimensions of error: failure to remember the target location (that is, a misbinding error or a random guess) and imprecision in localization (which applies to both target and misbinding responses). It has shown convergent validity with the traditional metrics and is more accurate at recovering the parameters of simulated data 34 . Because the 2D-mixture model yielded results similar to traditional outcomes, for simplicity we chose to focus this report on the traditional outcomes, with the 2D-mixture model results presented as confirmation that interindividual differences in localization error and misbinding cannot be explained by random guessing. Code for the 2D-mixture model is freely available in the MemToolbox2D package for MATLAB 50 . The model is described in detail 34 but, in brief, a response density equation is defined as follows: θ and ψ are vectors indicating locations on the screen, where P(θ) is the probability of finding a response location θ , θ is the location of the target, φ i is the location of the nontarget stimulus i, m is the number of nontarget stimuli, A is the area of the screen and ψ σ is a bivariate Gaussian distribution with standard deviation σ and zero covariance. The parameters α, β and γ represent the proportion of target responding, misbinding and guessing, respectively. Because α, β and γ must sum to 1, α is not included as a free parameter in the fitting, so the three free parameters are β (misbinding), γ (guessing) and σ (imprecision), estimated using maximumlikelihood methods. Thus, the model isolates and quantifies three different sources of error. Any spatial units can be used, but we used pixels. The model assumes that guesses are uniformly distributed across the entire screen; see Supplementary Information for an exploration of alternatives to this assumption and information about goodness of fit.
To visualize the performance of the model on the raw data, we outputted the probability that each individual response location was (1) target, (2) guess, (3) misbind to the first distractor and (4) misbind to the second distractor. These probabilities were normalized for each trial so that they summed to 1, and then each response was classified into whichever category had the highest probability. The third and fourth categories were then combined, as both represent misbinding. These classifications are illustrated for the complete raw data in Supplementary Information. Data cleaning. Six participants underwent one trial in which the software did not record whether they had selected the correct or incorrect object, probably caused by the participant touching the screen exactly midway between the two objects. These six trials were excluded.
Life-course and clinical variables. Childhood cognitive ability was measured at age 8 years (or ages 11 or 15 years if earlier data were missing) as a standardized zscore based on tests of verbal and nonverbal ability, as previously described 43 .
Educational attainment was represented as the highest qualification achieved by age 26 years, grouped into three categories: no qualification; vocational or O-levels and equivalents; A-levels or degree and equivalents.
Socioeconomic position was derived from participants' own occupation at age 53 years, or earlier if this was missing, coded according to the UK Registrar General's Standard Occupational Classification and classified as manual or nonmanual.
Participants were classified as having a neurological or major psychiatric condition (including dementia and mild cognitive impairment) as previously described 43 (see Fig. 1a for specific diagnoses). Participants not meeting these criteria are herein referred to as cognitively normal and represent a sample free from possible confounding neurological or psychiatric comorbidities. This does not imply that all participants with a neurological or major psychiatric condition necessarily had a measurable cognitive impairment.
Biomarker measures. As previously described 28,47,51 , β-amyloid PET and multimodal MRI data were collected simultaneously during a 60-min scanning session on a single Biograph mMR 3 T PET/MRI scanner (Siemens Healthcare), with intravenous injection of 370 MBq of 18 F-Florbetapir (Amyvid). β-amyloid deposition was quantified using the SUVR calculated from cortical regions of interest with a reference region of eroded subcortical white matter. A cutoff point for amyloid positivity was determined using a mixture model to define two Gaussians, and taking the 99th percentile of the lower (amyloid-negative) Gaussian at SUVR > 0.6104 (refs. 43,47,51 ).
Global WMHV was generated using an automated segmentation algorithm followed by visual quality control 47,52 . Hippocampal volume was generated using the automated segmentation method known as similarity and truth estimation for propagated segmentations, with appropriate manual editing 53 . Total intracranial volume (TIV) was generated using statistical parametric mapping software (SPM12; http://www.fil.ion.ucl.ac.uk/spm) 54 .
APOE genotype was determined 28 and classified as either ε4 carrier or non-ε4 carrier. The number of homozygous ε4 carriers (n = 11.3% of the sample) was too small to consider them as a separate group, but descriptive statistics on their performance are provided in Supplementary Information.

Statistical analyses.
Analyses were conducted using Stata 15 (StataCorp). Statistical significance was set at the conventional threshold of P < 0.05.
To investigate associations between performance on the 'What was where?' task and biomarkers of brain pathologies, analyses included only those participants classified as cognitively normal and for whom complete biomarker data were available (n = 398; Fig. 1a). The rationale for exclusion of participants with neurological or major psychiatric conditions was that these conditions can have varied impacts on cognitive performance that may confound associations between the key predictors of interest (preclinical amyloid pathology and APOE ε4) and visual working memory, and the numbers involved were not sufficient to provide power to detect meaningful differences between specific conditions. Multivariable regression models were fitted (details given below), with predictors of amyloid status (positive versus negative), hippocampal volume, WMHV, APOE ε4 (carrier versus noncarrier) and delay (short versus long). An additional predictor of load (low versus high) was included for all outcomes except those relating to misbinding, since misbinding cannot occur under the low-load condition. TIV was included as a covariate to adjust for the correlation between brain volume and head size, and we additionally adjusted for sex, age at assessment and the following life-course factors that have previously been shown to predict cognitive performance throughout adulthood in this cohort 43,44,55,56 : childhood cognitive ability, education and socioeconomic position. Adjusting for these factors reduces the unexplained variance in cognitive performance between individuals, which can increase the sensitivity of our analyses to detect subtle effects of APOE ε4 and brain pathologies 43 . Analyses were additionally rerun, replacing dichotomized amyloid status with continuous SUVR. We did not apply a correction for multiple comparisons, following recommendations in the statistical literature 57,58 , because this was a hypothesis-driven study motivated by previous literature.
Where between-individual factors were significantly associated with performance, we tested for interactions with delay (short versus long) to investigate whether group differences were due to better retention over time. We also tested for interactions between APOE ε4 and amyloid (dichotomous amyloid status and continuous SUVR) to investigate whether the effects of APOE ε4 differed according to the burden of amyloid pathology.
Primary outcomes. Analyses were conducted using trial-by-trial data rather than summary scores (for example, mean localization error), to avoid loss of information.
Identification errors (correct versus incorrect) and misbinding errors (yes versus no) were analyzed using generalized estimating equations (GEE) logistic regression models with an independent correlation structure and robust standard errors, to allow for the correlation between repeated measures of the same participant. Results are expressed as odds ratios (OR) for ease of interpretation.
Localization error was analyzed using GEE models, assuming a normal distribution for the dependent variable and an identity link (as with standard linear regression), but including an exchangeable correlation structure and robust standard errors. Localization errors were first log-transformed because the distributions were positively skewed. Model assumptions were tested by examination of residual plots; no departures from assumptions were noted. 2D-mixure model outcomes. To generate the outcome scores for analysis, the mixture model (above) was fitted twice for each participant: once using their responses to short-delay trials (low and high load combined) and once using their responses to long-delay trials (low and high load combined). In regard to the traditional localization metrics, the model included only responses for trials in which the correct object was identified from the two-choice array. This generated a value for the misbinding, guessing and imprecision parameters for each participant under both short-and long-delay conditions. It was not possible to separate responses by load, because the number of low-load trials was too small for reliable estimation of the imprecision parameter (as this is a standard deviation metric). Low-load trials do not influence the estimation of the misbinding parameter, since they do not contain distractors.
The imprecision parameter was analyzed using the same model structure as localization error (above), because it was approximately normally distributed. The guessing and misbinding parameters were analyzed using the same model structure Letters Nature agiNg as identification errors (above), since these represent proportions of responses classified as guesses and misbinds, respectively (see Supplementary Information for examination of goodness of fit and visual illustration of the performance of the model).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All data from NSHD are curated and stored by the Lifelong Health and Aging Unit at UCL. Anonymized data will be shared by request from qualified investigators (https://skylark.ucl.ac.uk/NSHD/doku.php).

Code availability
Code for the 2D-mixture model (MATLAB) is freely available at https://doi. org/10.5281/zenodo.3752705. Code for statistical analyses conducted in Stata is provided in Supplementary Information.