OBJECTIVES: To evaluate, in people with multiple sclerosis, two psychometric assumptions that must be satisfied for valid use of the medical outcomes study 36-item short form health survey (SF-36): the data are of high quality and, it is legitimate to generate scores for eight scales and two summary measures using the standard algorithms. METHODS: SF-36 data from 438 people representing the full range of multiple sclerosis were examined (mean age 48; 70% women). Data quality (per cent missing data and computable scale and summary scores) were determined, six scaling criteria were tested to determine the legitimacy of generating the eight SF-36 scale scores using Likert's method of summed ratings, and two scaling criteria were tested to determine the appropriateness of the standard SF-36 algorithms for weighting scale scores to generate two summary measures. RESULTS: Data quality was excellent except in the most disabled subgroup where missing responses reached a maximum of 16.5% and summary scores could only be computed for 72%. There was clear support for the generation of SF-36 scale scores. Item response distributions were symmetric, item mean scores and variances were equivalent, corrected item-total correlations were high (range 0.46-0.85) and similar, and definite scaling success rates exceeded 96%. Nevertheless, there were notable floor or ceiling effects in four of the eight scales. Assumptions for generating two SF-36 summary measures were only partially satisfied. Although principal components analysis suggested a two component model, these components explained less than 60% of the total variance in SF-36 scales, and less than 75% of the variance in five of the eight scales. Moreover, scale to component correlations did not support the use of scale weights derived from United States population data. CONCLUSIONS: When using the SF-36 as a health measure in multiple sclerosis summary scores should be reported with caution.