Enhancing Robustness of a Sound-Based AI Model for Automated Sleep Staging: Validating on Unseen Open Dataset

Oct 20, 2023
World Sleep 2023


The sound-based AI model for sleep staging has gained considerable attention as a potential solution for convenient sleep monitoring at home. To achieve its widespread adoption, it is crucial to ensure accurate performance for a diverse range of individuals. However, it requires a comprehensive training dataset that encompasses various individuals, which poses challenges in acquiring diverse sleep sound data. In this study, we addressed this challenge by introducing a training algorithm that improves the generalization capabilities of the AI model. To assess the model's robustness, specifically its generalization performance we test a model trained on data from Asian individuals using an open dataset composed of European individuals.

Materials and Methods:
We trained the AI model with an additional objective, called consistency loss, to enhance the robustness of the model. This objective aimed to ensure consistent sleep stage predictions regardless of the data characteristics. By training the model with this objective, we achieved improved performance on unseen data.
We trained the model with a training dataset that consists of labeled 2574 pairs of polysomnography (PSG) and audio recordings in SNUBH. To validate the impact of different races on the model performance, the SNUBH dataset (N = 454) from Asian and PSG-Audio dataset (N = 282) from European are used as a test set (Korompili et al., 2021).

For the baseline model, SNUBH dataset achieved 70.35% accuracy, while PSG-audio dataset achieved 64.96% accuracy, showing relatively low accuracy (5.39%). The proposed model showed 71.22% accuracy in SNUBH and 69.33% accuracy in the PSG-audio model, remarkably reducing the difference between the two datasets (1.89%). Weighted f1-score reveals that the proposed model (71.42% for SNUBH; 73.30% for PSG-audio) is more accurate than the baseline model (70.58% for SNUBH; 69.79 for PSG-audio) for both datasets. In subgroup analysis, a positive correlation between weighted f1-score and AHI severity was observed in both models (p < 0.01). Specifically, as compared to the baseline model (r = 0.40), the proposed model (r = 0.37) showed a relatively weak correlation. There was no significant difference by gender in both models.

Our study validated the performance and generalization capabilities of the model trained with the proposed objective. The model became more robust so that it maintained the performance advantage even on unseen public data. Our findings highlight the increased accuracy of the model for a wide range of individuals, providing valuable contributions to public healthcare by enhancing their overall comprehension of sleep and its impact on health and well- being.


J. Kim
K.S. Cha
E. Cho
D. Kim
D. Lee
K.-Y. Jung
I.-Y. Yoon
C. Kushida


1. I confirm that the abstract and that all information is correct.: Yes
2. I confirm that the abstract constitutes consent to publication.: Yes
3. I confirm that I submit this abstract on behalf of all authors.: Yes
I understand that the presenting author MUST register for the congress.: Yes