Comparative Analysis of 11 Consumer Sleep Trackers with Polysomnography

Oct 20, 2023
World Sleep 2023


Consumer sleep trackers (CSTs) have emerged as affordable alternatives to polysomnography (PSG) for monitoring sleep patterns, addressing the time-intensive and expensive constraints of PSG. However, limited studies have comprehensively validated the performance of different CSTs, particularly when conducted in the same environment with the same participants. This study aims to fill this research gap by validating the accuracy of various types of CSTs through a comparison with PSG. Additionally, by including widely-used CSTs and conducting a multicenter study with a large sample size, the research seeks to provide comprehensive insights into the performance and suitability of these CSTs in different scenarios.

Materials and Methods:
We recruited 75 participants, from a tertiary hospital (Seoul National University Bundang Hospital, SNUBH) and a primary sleep- specialized clinic (Clionic Lifecare Clinic, CLC) in Korea. The study analyzed 11 CSTs, including five wearables (Google Pixel Watch, Galaxy Watch 5, Fitbit Sense 2, Apple Watch 8 and, Oura Ring 3), three nearables (Withings Sleep Tracking Mat, Google Nest Hub 2, and Amazon Halo Rise), and three airables (SleepRoutine, SleepScore, and Pillow), which are commercially available. Participants were randomly divided into two groups and while undergoing PSG wore a total of eight out of the 11 CSTs simultaneously. CSTs that could potentially interfere with each other were not simultaneously worn. Each CST's results were compared and analyzed against the results of PSG.

In terms of 4-sleep stage estimation performance, the airable SleepRoutine demonstrated the highest Macro F1 score of 0.6863 in estimating sleep stages, followed by the nearable Amazon Halo Rise with a macro F1 score of 0.6242. Wearable CSTs, such as the Google Pixel Watch performed well in the deep stage estimation.
For sleep measure estimation performance, including sleep efficiency, sleep latency, and REM latency, the wearable Galaxy Watch 5 and the airable SleepRoutine showed smaller biases, while the SleepRoutine and the wearable Oura Ring 3 had no proportional bias. The SleepRoutine provided accurate and consistent sleep measure estimates with minimal bias.
In subgroup analyses based on gender, BMI, and sleep efficiency, the SleepRoutine consistently outperformed other devices across all subgroups. Differences in macro F1 scores between the tertiary hospital SNUBH and the primary clinic CLC were observed for certain devices, potentially due to variations in recruitment methods or population characteristics.

Our research has revealed significant variation in the accuracy of estimating sleep stages among the 11 commercially available CSTs compared to PSG. Additionally, we have identified significant performance differences for specific sleep measures or subgroups among each CST. These findings highlight the need for comprehensive validation and transparency in the AI, data, and algorithms used by CSTs ensuring reliable sleep analysis for consumers in diverse situations. Overall, this research provides valuable insights for consumers seeking trustworthy CSTs, emphasizing the importance of future studies prioritizing international comparisons and the inclusion of diverse populations to enhance the generalizability of the findings.


T. Lee
Y. Cho
K.S. Cha
J. Jung
J. Cho
H. Kim
D. Kim
J. Hong
D. Lee
M. Keum


1. I confirm that the abstract and that all information is correct.: Yes
2. I confirm that the abstract constitutes consent to publication.: Yes
3. I confirm that I submit this abstract on behalf of all authors.: Yes
I understand that the presenting author MUST register for the congress.: Yes