Estimation of T-cell repertoire diversity and clonal size distribution by Poisson abundance models.
Sepúlveda, Nuno;
Paulino, Carlos Daniel;
Carneiro, Jorge;
(2010)
Estimation of T-cell repertoire diversity and clonal size distribution by Poisson abundance models.
Journal of immunological methods, 353 (1-2).
pp. 124-137.
ISSN 0022-1759
DOI: https://doi.org/10.1016/j.jim.2009.11.009
Permanent Identifier
Use this Digital Object Identifier when citing or linking to this resource.
The answer to many fundamental questions in Immunology requires the quantitative characterization of the T-cell repertoire, namely T cell receptor (TCR) diversity and clonal size distribution. An increasing number of repertoire studies are based on sequencing of the TCR variable regions in T-cell samples from which one tries to estimate the diversity of the original T-cell populations. Hitherto, estimation of TCR diversity was tackled either by a "standard" method that assumes a homogeneous clonal size distribution, or by non-parametric methods, such as the abundance-coverage and incidence-coverage estimators. However, both methods show caveats. On the one hand, the samples exhibit clonal size distributions with heavy right tails, a feature that is incompatible with the assumption of an equal frequency of every TCR sequence in the repertoire. Thus, this "standard" method produces inaccurate estimates. On the other hand, non-parametric estimators are robust in a wide range of situations, but per se provide no information about the clonal size distribution. This paper redeploys Poisson abundance models from Ecology to overcome the limitations of the above inferential procedures. These models assume that each TCR variant is sampled according to a Poisson distribution with a specific sampling rate, itself varying according to some Exponential, Gamma, or Lognormal distribution, or still an appropriate mixture of Exponential distributions. With these models, one can estimate the clonal size distribution in addition to TCR diversity of the repertoire. A procedure is suggested to evaluate robustness of diversity estimates with respect to the most abundant sampled TCR sequences. For illustrative purposes, previously published data on mice with limited TCR diversity are analyzed. Two of the presented models are more consistent with the data and give the most robust TCR diversity estimates. They suggest that clonal sizes follow either a Lognormal or an appropriate mixture of Exponential distributions. According to the ecological interpretation of these models, the T-cell repertoire would be divided in several T-cell niches, themselves created in a series of steps. Definitive conclusions, however, would require larger samples. It is shown here that samples 100-fold larger than hitherto available ones would be sufficient to discriminate candidate models. These large sample sizes are currently affordable using massively parallel sequencing technology. Foreseeing this we provide the package PAM for the R software that will facilitate T-cell repertoire data analysis based on Poisson abundance models.