Prediction of Sleep Stages Using Smartphone Audio Recordings in Home Environments

Jan 6, 2023
JMIR 2023



The growing public interest and awareness regarding the significance of sleep is driving the demand for sleep monitoring at home. In addition to various commercially available wearable and nearable devices, sound-based sleep staging via deep learning is emerging as a decent alternative for their convenience and potential accuracy. However, sound-based sleep staging has only been studied using in-laboratory sound data. In real-world sleep environments (homes), there is abundant background noise, in contrast to quiet, controlled environments such as laboratories. The use of sound-based sleep staging at homes has not been investigated while it is essential for practical use on a daily basis. Challenges are the lack of and the expected huge expense of acquiring a sufficient size of home data annotated with sleep stages to train a large-scale neural network.


This study aims to develop and validate a deep learning method to perform sound-based sleep staging using audio recordings achieved from various uncontrolled home environments.


To overcome the limitation of lacking home data with known sleep stages, we adopted advanced training techniques and combined home data with hospital data. The training of the model consisted of 3 components: (1) the original supervised learning using 812 pairs of hospital polysomnography (PSG) and audio recordings, and the 2 newly adopted components; (2) transfer learning from hospital to home sounds by adding 829 smartphone audio recordings at home; and (3) consistency training using augmented hospital sound data. Augmented data were created by adding 8255 home noise data to hospital audio recordings. Besides, an independent test set was built by collecting 45 pairs of overnight PSG and smartphone audio recording at homes to examine the performance of the trained model.


The accuracy of the model was 76.2% (63.4% for wake, 64.9% for rapid-eye movement [REM], and 83.6% for non-REM) for our test set. The macro F1-score and mean per-class sensitivity were 0.714 and 0.706, respectively. The performance was robust across demographic groups such as age, gender, BMI, or sleep apnea severity (accuracy 73.4%-79.4%). In the ablation study, we evaluated the contribution of each component. While the supervised learning alone achieved accuracy of 69.2% on home sound data, adding consistency training to the supervised learning helped increase the accuracy to a larger degree (+4.3%) than adding transfer learning (+0.1%). The best performance was shown when both transfer learning and consistency training were adopted (+7.0%).


This study shows that sound-based sleep staging is feasible for home use. By adopting 2 advanced techniques (transfer learning and consistency training) the deep learning model robustly predicts sleep stages using sounds recorded at various uncontrolled home environments, without using any special equipment but smartphones only.


Hai Hong Tran
Jung Kyung Hong
Hyeryung Jang
Jinhwan Jung
Jongmok Kim
Joonki Hong
Minji Lee
Jeong-Whun Kim
Clete A Kushida
Dongheon Lee