Age groups that sustain resurging COVID-19 epidemics in the United States

Following initial declines, in mid 2020 a resurgence in transmission of novel coronavirus disease (COVID-19) occurred in the US and Europe. As COVID19 disease control efforts are re-intensified, understanding the age demographics driving transmission and how these affect the loosening of interventions is crucial. We analyze aggregated, age-specific mobility trends from more than 10 million individuals in the US and link these mechanistically to age-specific COVID-19 mortality data. We estimate that as of October 2020, individuals aged 20-49 are the only age groups sustaining resurgent SARS-CoV-2 transmission with reproduction numbers well above one, and that at least 65 of 100 COVID-19 infections originate from individuals aged 20-49 in the US. Targeting interventions – including transmission-blocking vaccines – to adults aged 20-49 is an important consideration in halting resurgent epidemics and preventing COVID-19-attributable deaths.


Reconstructing human contact patterns and SARS-CoV-2 transmission
To test this hypothesis and disentangle the various factors, we incorporated the mobility data into a Bayesian contactand-infection model that describes time-changing contact and transmission dynamics at state and metropolitan arealevel across the US. For the time period prior to changes in mobility trends, we used data from pre-COVID-19 contact surveys (6), and each locations's age composition and population density to predict contact intensities between individuals grouped in 5-year age bands (figs. S4 to S6), similar as in (18). On weekends, contact intensities between school-aged children and teens are lower than on weekdays, while intergenerational contact intensities are higher. In the model, the observed age-specific mobility trends of Fig. 1 are then used to estimate in each location (state or metropolitan area) daily changes in age-specific contact intensities for individuals aged 20 and above. For younger individuals, for who mobility trends are not recorded, contact intensities during school closure periods were set to estimates from two contact surveys conducted post COVID-19 emergence (7,8). After school reopening in August 2020, relative changes in disease relevant contacts from and to children and teens aged 0-19 were estimated through the model. Contact intensities between children and teens were modeled and estimated separately, to account for potentially lower or higher disease relevant contacts between children and teens in the context of existing non-pharmaceutical interventions within and outside schools (see Materials and methods). As in (19), the model further incorporates random effects in space, time, and by age to allow for unobserved, potential age-specific factors that could modulate disease-relevant contact patterns. These random effects enabled us to identify signatures of age-specific, behavioral drivers of SARS-CoV-2 transmission beyond the mobility data in Fig. 1, that may underlie the highly heterogeneous epidemic trajectories across the US. Finally, the reconstructed contact intensities are used in the model to estimate the rate of SARS-CoV-2 transmission, and subsequently infections and deaths. Figure 0 in the extended abstract provides a model overview, and full details are in the supplementary materials.

Estimated disease dynamics closely reproduce age-specific COVID-19 attributable death counts
The contact-and-infection model was fitted to the Foursquare mobility trends, and age-specific, COVID-19-attributed mortality time series data, which we recorded daily from publicly available sources in 42 US states, the District of Columbia and New York City since March 15, 2020 ( fig. S7, see also supplementary materials). Our overall rationale was that, reflecting the highly structured nature of human contacts, transmissions from age groups are received by specific other age groups, and mortality accrues in the age groups receiving infections. Thus, working back from the time evolution of reliably documented, age-specific COVID-19 attributable deaths, it is possible to reconstruct age-specific drivers of transmission during particular periods in time. Inference was performed in a Bayesian framework and restricted to 38 US states, the District of Columbia and New York City with at least 300 COVID-19-attributed deaths, giving a total of 8,676 observation days. The estimated disease dynamics closely reproduced the age-specific COVID-19 death counts ( fig. S8). Figure 2 illustrates the model fits for New York City, Florida, California, and Arizona, showing that the inferred epidemic dynamics differed markedly across locations. For example, in New York City, the epidemic accelerated for at least 4 weeks since the 10th cumulative death and until agespecific reproduction numbers started to decline, resulting in an epidemic of large magnitude as shown through the estimated number of infectious individuals (Fig. 2, mid column). Subsequently, we find that reproduction numbers for all age groups were controlled to well below one except for individuals aged 20-49 ( Fig. 2, rightmost column), resulting in a steady decline of infectious individuals. In the model, children and teens returned to their pre-lockdown contact intensities on August 24, 2020 or later, depending on when state administrations no longer mandated state-wide school closures, and relative decreases or increases in their disease relevant contact intensities after school-reopening were estimated. Concomitantly, reproduction numbers from children aged 0-9 and teens aged 10-19 increased, but as of the last observation week in October 2020 we find no strong evidence that their reproduction numbers have exceeded one at population level in most states and metropolitan areas considered. Detailed situation analyses for all locations are presented in the supplementary materials. Figure 3 summarizes the epidemic situation for all states and metropolitan areas evaluated, and the age groups that sustain COVID-19 spread. In the last observation week in October 2020, the estimated reproduction number across all locations evaluated was highest from individuals aged 35-49 (1.39 [1.34-1.44]) and 20-34 (1.29 [1.24-1.36]), and around one for age groups 1019 and 50-64 (tables S1 and S2). These trends across age groups were largely consistent over time. The primary mechanisms underlying the high reproduction numbers from 20-49-year-olds are that at population level, adults aged 20-49 naturally have most contacts to other adults aged 20 and above, which are more susceptible to COVID-19 than younger individuals, paired with increasing mobility trends for these age groups since April 2020 ( Fig. 1 and fig. S6). In addition, from the death time series data, the model inferred characteristic random effect signatures in time and by age across locations ( fig. S9), which indicate elevated transmission risk per venue visit for individuals aged 20-49 relative to other age groups. Figure S10 visualizes the combined, estimated effects of mobility and behavior on transmission risk, and reveals together with Fig. 3 (table S4). Spatially, the contribution of adults aged 35-49 were estimated to be remarkably homogeneous across states, whereas the estimated contributions of young adults aged 20-34 to COVID-19 spread tended to be higher in Southern, South-western, and Western regions of the US (Fig. 4), in line with previous observations (4).

SARS-CoV-2 transmission is sustained primarily from age groups 20-49
No substantial shifts in age-specific disease dynamics over time Over time, we found that the share of age groups among the observed COVID-19 attributable deaths was remarkably constant ( Fig. 5A and fig. S11), which stands in contrast to the large fluctuations in the share of age groups among reported cases (4). To test for shifts in the share of age groups among COVID-19 infections, we next back-calculated the number of expected, age specific infections per calendar month of aggregated COVID-19 attributable deaths using meta-analysis estimates of the age-specific COVID-19 infection fatality ratio  20). This empirical analysis suggested no statistically significant trends in the share of age groups among COVID19 infections ( Fig. 5B and fig. S12), which is further supported by model estimates (Fig. 5C and fig. S13). Based on the combined mobility and death data, we find the reconstructed fluctuations in age-specific reproduction numbers had only a relatively modest impact on the contribution of age groups to onward spread over time, and no evidence that young adults aged 20-34 were the primary source of resurgent COVID-19 in the US over summer 2020. These results underscore that, when testing rates are heterogeneous and not population representative, it is challenging to determine the age-specific pattern of transmission based only on reported case data.

School reopening has not resulted in substantial increases in COVID-19 attributable deaths
Between August and October 2020, school closure mandates have been lifted in 39 out of 40 of the US locations evaluated in this study, and provided 2,570 observation days to estimate the impact of school reopening on COVID-19 spread. The following analyses are therefore based on fewer data points than those aforementioned, rely on mortality figures accrued until end of October 2020, as well as reported school case data from Florida and Texas, which were used to define lower and upper bounds on cumulative attack rates among children and teens aged 5-18 (see Materials and methods). Reflecting stuttering transmission chains in school settings, reproduction numbers from children aged 0-9 and teens aged 10-19 were estimated at below one (respectively 0.52 [0.42-0.60] and 0.73 [0.57-0.88]) after schools were considered to have reopened in the model (Fig. 3 and table S2). Reproduction numbers from children were lower than from teens because at population-level preschoolers have fewer contacts than school-aged children ( fig. S6).
Since school closure mandates were lifted, the higher reproduction numbers from children and teens resulted in age shifts in the sources of SARS-CoV-2 infections. In October 2020 an estimated 2.7% [1.8%-3.7%] of infections originated from children aged 0-9, 7.1% [4.5%- 10 (table S5 vs table S4). The reconstructed shifts in the age of COVID-19 sources after school reopening are relatively modest compared to the typical age profile of infection sources of pandemic flu (21), and reflect lower age-specific susceptibility to SARS-CoV-2 transmission among children and teens, but also substantially fewer, inferred disease relevant contacts from children and teens than would be expected from their corresponding pre-pandemic contact intensities. The mechanisms behind these beneficial effects remain unclear, but the model suggests they are substantial. In retrospective counterfactual scenarios we explored what COVID-19 case and death trajectories would have been expected if schools had remained closed, and find a large overlap between the counterfactual and actual case and death trajectories (Fig. 6, fig. S15). However, since children and teens seed infections in older age groups that are more transmission efficient, as of October 2020, school opening is associated with an estimated 25.7% [14.5%-40.5%] increase of COVID-19 infections and a 5.9% [3.4%-9.3%] increase in COVID-19 attributable deaths (table S7). Larger proportions of COVID-19 infections and deaths are attributed to school re-openings if the actual number of cases among school-aged children is more than six times larger than the number in school situation reports (table S7). These findings indicate that adults aged 20-34 and 3549 continue to be the only age groups that contribute disproportionally to COVID-19 spread relative to their size in the population (fig. S14), and that the impact of school reopening on resurgent COVID-19 is mitigated most effectively by strengthening disease control to adults aged 20-49.

Caveats
The findings of this study need to be considered in the context of the following limitations. Rossen and colleagues (22) observed that US excess deaths between the beginning of the pandemic and October 2020 were by 38% higher than the reported COVID-19 attributable deaths, suggesting that the death data on which this analysis rests are subject to underreporting. The scale of the US epidemics may be larger than we infer, and our age-specific analyses may be biased if underreporting of deaths depends on age. However, due to the high proportion of asymptomatic COVID-19 cases (5), underreporting is a substantially larger caveat for reported case data, and in particular the observed shifts in the share of age groups among reported cases (4,23), which are absent from the share of age groups among reported deaths (fig. S11). This suggests that age-specific death data provide a more reliable picture into resurgent COVID-19 epidemics than reported cases. We further rely on limited data from two contact surveys performed in the United Kingdom and China to characterize contact patterns from and to younger individuals during school closure periods (7,8), and this could have biased our findings that children and teens have contributed negligibly to SARS-CoV-2 spread until school reopening. To address this limitation, we explored the impact of higher inter-generational contact intensities involving children during school closure periods, and in these analyses the estimated contribution of children aged 0-9 to onward spread until August 2020 remained below 5% and the contribution of teens aged 10-19 remained below 12.5% (see supplementary materials). Epidemiologic models are sensitive to assumptions on the infection fatality ratio (IFR) that enables the estimation  (20). To test the assumed IFR, we compared the scale of the estimated resurgent epidemics against data from sero-prevalence surveys conducted by the Centers for Disease Control and Prevention (CDC) (24), and found good congruence (table S6 and supplementary materials). The COVID19 epidemic is more granular than considered in our spatial modelling approach. Substantial heterogeneity in disease transmission exists at county level (25), and our situation analyses by state and metropolitan areas need to be interpreted as averages. To no exception, the model underlying our analyses also relies on simplifying mathematical assumptions on population-level disease spread, which may be shown unsuitable as further evidence on SARS-CoV-2 transmission accumulates (26). For instance, the model assumes children and teens are as transmissible as adults, which has been challenging to quantify to date (27), and falls short of accounting for population structure other than age, such as household settings, where attack rates have been estimated to be substantially higher than in non-household settings (28). It is possible that the model under-estimates the impact of school reopening on SARS-CoV-2 transmission. Data from countries that have re-opened schools have provided little evidence for substantial transmission in schools, nor significantly increased community-level infection rates after school reopening until the emergence of more transmissible SARS-CoV-2 variants (29,30), but this might reflect frequent sub-clinical infection among school-aged children. More transmissible SARS-CoV-2 variants could increase reproduction numbers to above one for all age groups, which implies substantial spread from all age groups, and require generally stricter control measures across all ages to prevent COVID-19 attributable deaths (31).

Conclusions
This study provides evidence that the resurgent COVID-19 epidemics in the US in 2020 have been driven by adults aged 20-49, and in particular adults aged 35-49, before and after school reopening. Unlike pandemic flu, these adults accounted after school reopening in October, 2020 for an estimated 72.2% [68.6%-75.9%] of SARS-CoV-2 infections in the US locations considered, whereas less than 5% originated from children aged 0-9 and less than 10% from teens aged 10-19. The population mobility data, and the death data provided by state and city Departments of Health reveal heterogeneous disease spread in the US, with higher transmission risk per venue visit attributed to individuals aged 20-49 over distinct time periods, and younger epidemics with a greater share of individuals aged 20-34 among cumulated infections in the South, South-western, and Western regions of the US.
Over time, the share of age groups among reported deaths has been remarkably constant, suggesting that young adults are unlikely to have been the primary source of resurgent epidemics since summer 2020, and that instead changes in mobility and behavior among the broader group of adults aged 20-49 underlie resurgent COVID-19 in the US in 2020. This study indicates that in locations where novel highly-transmissible SARS-CoV-2 lineages have not yet established, additional interventions among adults aged 20-49, such as mass vaccination with transmission-blocking vaccines, could bring resurgent COVID-19 epidemics under control and avert deaths.

Materials and Methods
To characterize the role of age groups in driving resurgent COVID-19, we have taken a systematic approach that involved data collection, mathematical modelling, likelihoodbased inference, and validation against external data. The following sections summarize our materials and methods, and full technical details are in the Data Availability Statement and the supplementary Materials.

Data and data processing
The analyses presented in this study are based on age-specific COVID-19 attributable mortality counts that were collected daily from US state and city Departments of Health (DoH), all-age COVID-19 death counts, all-age COVID-19 case counts, COVID-19 case counts in school settings K1-K15, human contact data before and during the pandemic, and human mobility data during the pandemic.
Briefly, age-specific COVID-19 cumulative death counts were retrieved for 42 US states, the District of Columbia and New York City from city or state DoH websites, data repositories, or via data requests to DoH (table S8). Data were checked for consistency and adjusted when necessary. Agespecific COVID-19 death time series were reconstructed from cumulative counts, and the time series were used for model fitting (32).
All-age daily COVID-19 case and death counts from February 01, 2020 until October 30, 2020 regardless of age were obtained from John Hopkins University (JHU) for all U.S. states and the District of Columbia (3), except New York State. For New York State, daily COVID-19 death counts from February 01, 2020 until October 30, 2020 were obtained from the New York Times' (NYT) data (33). For New York City, daily COVID-19 deaths counts were obtained from the GitHub Repository (34). The all-age death counts were used for model fitting prior to when age-specific death counts were reported for each location, and all-age case counts were used for model fitting for the entire study period.
COVID-19 case counts in school settings K1-K15 were retrieved for Florida and Texas and matched with student enrolment numbers in each school from the Common Core of Data Americas Public Schools database (35). Cumulative attack rates were obtained by dividing cumulative reported cases among students by student numbers, and used for model fitting.
Human contact data before the pandemic were obtained from the Polymod study (6), and used to predict baseline contact matrices during the early part of the pandemic for each location, similar as in (18). Given the variation in contact patterns seen across survey settings, baseline contact matrices for each study location in the US were predicted based on each location's population density and age composition with a log linear regression model. Age-specific population counts were obtained from (36). Area measurements were obtained for every US states and for New York City respectively from (37) and (38). Contact matrices were predicted by 5-year age bands for weekdays and weekends, and used in the model. Human contact data during the pandemic were retrieved from two surveys (7,8), and used in the model to specify contact patterns from and to individuals aged 0-19 during periods of school closure.
Age-specific human mobility trends were derived from the Foursquare Labs Inc. US first-party panel that includes >10 million of opt-in, always-on active users. From operated and partner apps, Foursquare collect a variety of device signals against opted-in users including intermittent device GPS coordinate pings, WiFi signals, cell signal strength, device model, and operating system version. A smaller set of labeled explicit check-ins are captured from a portion of the user panel. Check-ins are explicit confirmations that a user was at a given venue at a given point of time, and serve as training labels for a non-linear model that is used to predict visits among users with unlabeled visits in terms of probabilities as to which venue users ultimately visited (11). Visit probabilities among panellists were processed and aggregated by day, age, and study location, and standardised to daily per capita visits using latest US Census data. Percent changes in daily venue visits by age and study location were obtained relative to the baseline period February 3 to February 9, 2020, and used for analysis and model fitting. For validation purposes, a second mobility data set was obtained from Emodo. The Emodo data set quantifies the proportion of individuals with at least one observed ping outside the user's home location, out of a panel of individuals whose GPS enabled devices emitted at least one ping on the corresponding day. Primary data were similarly aggregated by day, age, and study location, standardised to daily per capita visits using latest US Census data, and mobility trends were calculated relative to the baseline period February 19 to March 3, 2020.

Statistical analysis of human mobility data and COVID-19 attributable death data
The age-specific human mobility data showed marked time trends, which were characterised in terms of three phases defined by the dip date after which the 15-day moving average fell below 10% compared to the average value in the two prior weeks, and the rebound date that corresponded to the date at which the 15-day moving average was lowest. Differences in the mobility trends relative to the February baseline period, before and after rebound dates, and relative to individuals aged 35-44 were assessed using Gamma regression models using log link and location by age interaction covariates.
To characterize the time evolution of deaths across locations and validate model fits, age-specific COVID-19 attributable deaths among the same age strata across locations were predicted by month with Dirichlet-Multinomial regression models. Trends in the share of age groups among monthly deaths were assessed by testing for differences in the proportions in the first month relative to subsequent months.
To test for potential differences in age-specific transmission dynamics based on the collected death data and without epidemic models, meta-analysis estimates of age-specific infection fatality ratios (20) were used to predict the share of age groups among infections from monthly age-specific deaths. Trends in the share of age groups among monthly infections were assessed by testing for differences in the proportions in the first month relative to subsequent months.

Contact-and-infection model
To quantify age-specific aspects of COVID-19 spread in heterogeneous populations, we formulated an age-specific, discrete-time renewal model in which disease transmission occurs via contact intensities between population groups stratified by 5-year age bands. The model has four key features described below. First, contact intensities vary in time and are inferred from signatures in the age-specific mortality and mobility data. This feature aims to reflect the substantial changes in human contact patterns during the pandemic (7)(8)(9). Second, the challenge and value of the model to produce generalizable knowledge is to explain disease spread across multiple locations with distinct demographics simultaneously. To this end, the renewal equations were embedded into a hierarchical model in which information on disease spread is borrowed across locations (1,39). Third, the model describes disease spread during the initial and later phase of the pandemic, as mobility patterns become less correlated with transmission risk and schools reopen (40,41). This feature allowed us to test for changes in disease dynamics over time. Fourth, the model is fitted in a Bayesian framework to the all-age and age-specific death data, all-age case data, case data from schools, and age-specific human mobility trends (42). This feature forced us to focus on a model whose parameters are inferable from the data across all locations. The model is described in detail in the supplementary materials. Briefly, we consider populations stratified by the 5-year age bands A, such that a ∈ A = {[0-4], [5][6][7][8][9], …, [75][76][77][78][79], [80][81][82][83][84] where g quantifies the relative infectiousness of individuals s days after infection. An important feature of SARSCoV-2 transmission is that similarly to other coronaviruses but unlike pandemic influenza (43), susceptibility to SARS-CoV-2 infection increases with age (7,21,44). Here, we used contact tracing data from Hunan province, China (7) to specify lower susceptibility to SARS-CoV-2 infection among children aged 0-9, and higher susceptibility among individuals aged 60+, when compared to the 10-59 age group as part of the transmission probabilities ρ a′ . Previously infected individuals are assumed to be immune to re-infection within the analysis period, consistent with mounting evidence for sustained antibody responses to SARS-CoV-2 antigens (45,46), so that For adults aged 20+, the time changing contact intensities were described in terms of the pre-pandemic baseline contact intensities in location m, which we denote by , , , m t a a′ C , and expected reductions in disease relevant contacts from contacting individuals of age a on day t in location m, which we denote by , , η m t a , and contacted individuals of age a' on day t in location m, , , η m t a′ , , , where a, a' ∈ { [20−24], ..., [85+]}. Expected reductions in disease relevant contacts were specified as a random effects model that included the observed, age-specific mobility trends as covariates. In the model, each age-specific mobility trend was decoupled into three separate covariates that reflect the initial pre-pandemic, dip, and rebound phases in human mobility trends, so that previously observed decreases in correlation between mobility trends and transmission risk could be captured (40,41,47). As the same number of venue visits in e.g., Wyoming may translate to different transmission risk than in e.g., New York City, spatial random effects allowed for scaling of mobility trends during the dip and rebound phase in each location. As venue visits do not capture all aspects of transmission risk, the model further incorporates independently for each location autocorrelated biweekly random effects to capture information on elevated, disease relevant contact intensities and transmission risk that is present in the death time series data. To test for agespecific signatures of elevated transmission risk, the model further included for each location age-specific random effects for individuals aged 20-49.
For children and teens aged 0-20, mobility data are not available, and during periods of school closure the contact intensities from and to children and teens were set to the average contact intensities reported in (7). This implied that relative to pre-pandemic contact patterns, peer-based contacts were substantially reduced, whereas contacts from an adult to children and teens increased slightly. In the model, schools were set to re-open on or after August 24, 2020 when state administrations no longer mandated state-wide school closures by that date (48,49). Thereafter, Eq. 4 was extended to include children and teens, and expected mobility reductions were estimated from the case and death data. In the absence of further data, a common average effect could be estimated across locations and children and teen age groups for the last two observation months, for a ∈ [0 − 20]. A further compound effect γ was added to modulate the number of disease relevant child/teenchild/teen contacts, which we interpreted as reduced infectiousness from children and teens and/or a positive impact of non-pharmaceutical interventions among school-aged children and teens.

Bayesian inference
Past age-specific disease dynamics across all locations were inferred from age-specific death data available across locations, and age-specific mobility data. To do this, in the model, a proportion , π m a of new infections in location m of age a die, and the day of death is determined by the infection-to-death distribution, which was assumed to be constant across age groups. The proportions , π m a were associated with a strongly informative prior derived from the meta-analysis of (20), but were allowed to deviate from the baseline infection fatality ratio through location-specific random effects. The expected number of deaths in location m on day t in age band a, , , m t a d , were aggregated to the reporting strata in each location, and fitted to the observed data using a Negative Binomial likelihood model. When age-specific death data were not available,

Validation and sensitivity analyses
Reconstructed past transmission dynamics were assessed against external data on the scale of the epidemic from seroprevalence surveys conducted across the US by the CDC (24). Validation results are reported in the supplementary materials, suggesting larger discrepancies between model fit and seroprevalence data for Connecticut and New York City, with larger epidemics reconstructed in the model than the data suggest. The contact-and-infection model does not account for sustained spatial importation of SARS-CoV-2 infections such as from New York City to Connecticut, and may have over-estimated the magnitude of self-sustaining epidemic in locations receiving sustained SARS-Cov-2 importations. However, we also note that the Connecticut seroprevalence estimates predict an infection to observed case ratio that is substantially below those of the other CDC seroprevalence studies. The inferred contact patterns were assessed against external data from the BICS study that quantified human contact patterns during the pandemic (9) Validation results are reported in the supplementary materials, suggesting similarly strong reductions in human contact intensities as in the survey data. Disaggregated by age, the model reproduces highest contact intensities among 35-44 year old individuals, comparatively lower contact intensities from individuals aged 45+, and largest reductions in contact intensities from individuals aged 25-34. The survey data suggest that contact intensities from individuals aged 18-24 could be higher than reconstructed through the contact-and-infection model, but we also note large confidence intervals around the survey estimates. Sensitivity analyses were conducted to assess central modelling assumptions on the infection fatality ratio, contact intensities among children and teens during periods of school closure, relative susceptibility of children and teens to SARS-CoV-2 infection, and are reported in the supplementary materials. Our findings on the age groups that drive SARS-CoV-2 transmission were found to be robust to these assumptions.