Diagnostic accuracy of convolutional neural networks in classifying hepatic steatosis from B-mode ultrasound images: a systematic review with meta-analysis and novel validation in a community setting in Telangana, India
Background: Ultrasound is a widely available, inexpensive, and non-invasive modality for evaluating hepatic steatosis (HS). However, the scarcity of radiological expertise limits its utility. Convolutional Neural Networks (CNNs) have potential for automated classification of HS using B-mode ultrasound images. We aimed to assess their diagnostic accuracy and generalisability across diverse study settings and populations.
Methods: We systematically reviewed two biomedical databases up to Dec 12, 2023, to identify studies that applied CNNs in the classification of HS using B-mode ultrasound images as input (PROSPERO: CRD42024501483). We supplemented this review with a novel analysis of the community-based Andhra Pradesh Children and Parents’ Study (APCAPS) in India to address the overrepresentation of hospital samples and lack of data on South Asian populations who exhibit a distinct central adiposity phenotype that could influence CNN performance. We quantitatively synthesised diagnostic accuracy metrics for eligible studies using random-effects meta-analyses.
Findings: Our search returned 289 studies, of which 17 were eligible. All but one of the 17 studies were based in hospital or clinical outpatient settings with curated cases and controls. Studies were conducted exclusively in East Asian, European, or North American populations. Studies employed varying gold standards: seven studies (41.18%) used liver biopsy, three (17.64%) used MRI proton density fat fraction, and seven (41.18%) used clinician-evaluated ultrasound-based HS grades. The APCAPS sample included 219 participants with radiologist-assigned HS grades. Across the range of study settings and populations, CNNs demonstrated good diagnostic accuracy. Meta-analysis of studies with low risk of bias reporting on five unique datasets showed a pooled area under the receiver operating characteristic curve of 0.93 (95% CI 0.73–0.98) for detecting any severity and 0.86 (95% CI 0.77–0.92) for detecting moderate-to-severe HS severity grades, respectively.
Interpretation: CNNs have good diagnostic accuracy and generalisability for HS classification, suggesting potential for real-world application.
Funding: Medical Research Council, UK (MR/T038292/1, MR/V001221/1).
Item Type | Article |
---|---|
Elements ID | 348216 |
Official URL | https://doi.org/10.1016/j.lansea.2025.100644 |
Date Deposited | 08 Aug 2025 08:53 |