Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies


Nanchahal, K; Mangtani, P; Alston, M; dos Santos Silva, I; (2001) Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. Journal of public health medicine, 23 (4). pp. 278-85. ISSN 0957-4832

Full text not available from this repository. (Request a copy)

Abstract

BACKGROUND: Studies on ethnic variations in health have played an important role in aetiological and health services research. Most routine datasets, however, do not include information on ethnicity. South Asians, one of the largest minority ethnic groups in Britain, have distinctive names that also allow differentiation of the main sub-groups with their important differences in health-related exposures and disease risks.<br/> METHODS: A computerized name recognition algorithm (SANGRA) was developed incorporating directories of South Asian first names and surnames together with their religious and linguistic origin. SANGRA was validated using health-related data with self-ascribed information on ethnicity.<br/> RESULTS: SANGRA was successful in recognizing South Asian origin in reference datasets, with sensitivity of 89-96 per cent, specificity of 94-98 per cent, positive predictive value (PPV) of 80-89 per cent and negative predictive value (NPV) of 98-99 per cent. Religious origin was correctly assigned in the majority of cases: sensitivity, specificity and PPV were 94 per cent, 91 per cent and 90 per cent for Hindus; 90 per cent, 99 per cent and 98 per cent for Muslims; and 76 per cent, 99 per cent and 94 per cent for Sikhs. SANGRA correctly identified 76 per cent Gujerati and 70 per cent Punjabi names, although only 62 per cent of Gujerati names were sufficiently distinct to be allocated to the Gujerati-only category and only 53 per cent Punjabi names were allocated to the Punjabi-only category. However, specificity and PPV were high for both languages (respectively 97 per cent and 93 per cent for Gujerati, and 99 per cent and 97 per cent for Punjabi).<br/> CONCLUSIONS: SANGRA provides a practical and valid method of ascertaining South Asian origin by name and, to a lesser degree of accuracy, of differentiating between the main religious and linguistic subgroups living in Britain. This algorithm will be useful in health-related studies where information on self-ascribed ethnicity is not available or is of a limited nature.<br/>

Item Type: Article
Keywords: *Algorithms, Asia, Southeastern/ethnology, *Database Management Systems, Directories, Ethnic Groups/*classification/statistics & numerical data, Great Britain/epidemiology, *Health Status, Human, Language, *Names, Patient Admission, Patient Identification Systems, Religion, Software, Support, Non-U.S. Gov't, Algorithms, Asia, Southeastern, ethnology, Database Management Systems, Directories, Ethnic Groups, classification, statistics & numerical data, Great Britain, epidemiology, Health Status, Human, Language, Names, Patient Admission, Patient Identification Systems, Religion, Software, Support, Non-U.S. Gov't
Faculty and Department: Faculty of Epidemiology and Population Health > Dept of Non-Communicable Disease Epidemiology
Faculty of Epidemiology and Population Health > Dept of Infectious Disease Epidemiology
Faculty of Public Health and Policy > Dept of Social and Environmental Health Research
Research Centre: Centre for Global Non-Communicable Diseases (NCDs)
PubMed ID: 11873889
Web of Science ID: 173321300005
URI: http://researchonline.lshtm.ac.uk/id/eprint/16630

Statistics


Download activity - last 12 months
Downloads since deposit
0Downloads
361Hits
Accesses by country - last 12 months
Accesses by referrer - last 12 months
Additional statistics for this record are available via IRStats2

Actions (login required)

Edit Item Edit Item