OBJECTIVES: The proportion of tuberculosis cases in a population that are clustered (i.e. share identical strains of Mycobacterium tuberculosis) reflects ongoing M. tuberculosis transmission. It varies markedly, but it is unclear how much of this variation reflects measurable differences in study design, setting and the patient population. We aimed to assess the relative impact of these factors and develop a tool to improve interpretation of the proportion clustered from an individual study. METHODS: We systematically reviewed all population-based TB clustering studies that used IS6110 RFLP as their main DNA fingerprinting technique. Meta-regression was used to see how much of the variation in the proportion clustered between studies could be explained by variables describing study design, setting and population. We compared expected clustering, based on study design and setting, with that observed. RESULTS: Forty-six studies were included. Just four factors related to study design and setting-study duration, sampling fraction, handling of low band strains and tuberculosis incidence-explained 28% of the variation in the proportion clustered. Additionally including average patient age and proportion foreign born explained 60% of the variation in clustering for industrialized countries. Comparison of expected and observed proportions showed that for some studies the expected proportion clustered differed strongly from that observed. CONCLUSIONS: We were able to account for much of the variation in the proportion clustered. The comparison of expected and observed clustering allows for a more valid comparison of studies and provides a tool for identifying outliers that warrant further investigation.