Accuracy of 11 Wearable, Nearable, and Airable Consumer Sleep Trackers: A Prospective and Multicenter Validation Study

Nov 2, 2023
JMIR mHealth and uHealth 2023



Consumer sleep trackers (CSTs) have gained significant popularity, because they enabled individuals to conveniently monitor and analyze their sleep. However, limited studies have comprehensively validated the performance of widely used CSTs. Our study therefore investigated popular CSTs which based on various biosignals and algorithms by assessing the agreement with polysomnography.


This study aims to validate the accuracy of various types of CSTs through a comparison with PSG. Additionally, by including widely-used CSTs and conducting a multicenter study with a large sample size, the research seeks to provide comprehensive insights into the performance and applicability of these CSTs for sleep monitoring.


The study analyzed 11 commercially available CSTs including five Wearables (Google Pixel Watch, Galaxy Watch 5, Fitbit Sense 2, Apple Watch 8, and Oura Ring 3), three Nearables (Withings Sleep Tracking Mat, Google Nest Hub 2, and Amazon Halo Rise), and three Airables (SleepRoutine, SleepScore, and Pillow). The 11 CSTs were divided into two groups, ensuring maximum inclusion while avoiding interference between the CSTs within each group. Each group (comprising 8 CSTs) was also compared via polysomnography.


The study enrolled 75 participants from a tertiary hospital and a primary sleep-specialized clinic in Korea. Across two centers, we collected a total 3890 hours of sleep sessions based on the 11 CSTs along with 543 hours of PSG recordings at the two centers. Each CST sleep recording covered an average of 353 hours. We analyzed a total of 349,114 epochs from the 11 CSTs compared with PSG, where epoch-by-epoch agreement in sleep stage classification showed substantial performance variation. More specifically, the highest macro f1 score was 0.69, while the lowest macro f1 score was 0.26.


Our study showed that among the 11 CSTs examined, specific CSTs showed substantial agreement with PSG, indicating their potential application in sleep monitoring, while other CSTs were partially consistent with PSG. This study offers insights into the strengths of each CST within the three different classes.


Lee T
Cho Y
Cha KS
Jung J
Cho J
Kim H
Kim D
Hong J
Lee D
Keum M