INTRODUCTION: Scale-up of HIV self-testing (HIVST) will play a key role in meeting the United Nation's 90-90-90 targets. Delayed re-reading of used HIVST devices has been used by early implementation studies to validate the performance of self-test kits and to estimate HIV positivity among self-testers. We investigated the stability of results on used devices under controlled conditions to assess its potential as a quality assurance approach for HIVST scale-up. METHODS: 444 OraQuick® HIV-1/2 rapid antibody tests were conducted using commercial plasma from two HIV-positive donors and HIV-negative plasma (high-reactive n = 148, weak-reactive n = 148 and non-reactive n = 148) and incubated them for six months under four conditions (combinations of high and low temperatures and humidity). Devices were re-read daily for one week, weekly for one subsequent month and then once a month by independent readers unaware of the previous results. We used multistage transition models to investigate rates of change in device results, and between storage conditions. RESULTS AND DISCUSSION: There was a high incidence of device instability. Forty-three (29%) of 148 initially non-reactive results became false weak-reactive results. These changes were observed across all incubation conditions, the earliest on Day 4 (n = 9 kits). No initially HIV-reactive results changed to a non-reactive result. There were no significant associations between storage conditions and hazard of results transition. We observed substantial statistical agreement between independent re-readers over time (agreement range: 0.74 to 0.96). CONCLUSIONS: Delayed re-reading of used OraQuick® HIV-1/2 rapid antibody tests is not currently a valid methodological approach to quality assurance and monitoring as we observed a high incidence (29%) of true non-reactive tests changing to false weak-reactive and therefore its use may overestimate true HIV positivity.