SnoreFormer: Home snoring detection with deep neural networks

Oct 20, 2023
World Sleep 2023


Snoring is a common problem that is closely tied to sleep-related breathing disorders such as obstructive sleep apnea (OSA), hypertension, and cardiovascular diseases. Traditionally, polysomnography (PSG) is the gold standard for diagnosing snoring, employing multiple sensors to measure variables like airflow pressure and snoring sound. However, conducting PSG in a home environment is challenging due to the complexity of the setup and the requirement for multiple sensors. In this context, our main contribution is creating a method, SnoreFormer, that leverages easily accessible sound data recorded on a smartphone to diagnose snoring. SnoreFormer uses deep neural networks to explore the statistical relationship between sound and snoring. We validated our model in both a clinical environment and a home environment using only sound data recorded by smartphone.

Materials and Methods:
Our model, SnoreFormer, can detect the presence of snoring based on 20 minutes of sound data, providing results in each of 30- second epoch. Initially, we transformed the raw sound data into Mel spectrograms. This widely-used feature representation technique in audio processing mimics a more human-like perception of sound. Consequently, the sound data for 20 minutes were converted into the sequence of 40 epochs of 30-second Mel spectrograms. Subsequently, the preprocessed data was fed into a Transformer, a state-of-the-art model architecture in machine learning, designed for sequence prediction tasks. The Transformer employs self-attention mechanisms that effectively capture the sequential and temporal dynamics of the snoring sound, hence enabling a more accurate detection model.
We utilized three distinct audio datasets: (1) audio data recorded by a solitary microphone chip during polysomnography in a clinical environment (Hospital PSG dataset, N = 1154), (2) audio data recorded by a smartphone during polysomnography in a clinical environment (Hospital smartphone dataset, N=327), and (3) audio data recorded by a smartphone during polysomnography in a home environment (Home smartphone dataset, N = 109). The home environment dataset was not used for training but only for testing the model's performance.

SnoreFormer model was tasked to identify the presence or absence of a snoring event within 30-second epochs. The model achieved an accuracy of 82.9% in a clinical environment (sensitivity: 81.6%, specificity: 83.3%) and 81.0% in a home environment (sensitivity: 73.1%, specificity: 84.0%). These results indicate that the model performs well even in a noise-intensive home environment. Notably, the SnoreFormer model maintained robust accuracy across various demographic factors, achieving 81.5% accuracy for men and 85.1% for women, and there was no significant difference in accuracy across different BMI and age categories.

The proposed model, SnoreFormer, accurately detected snoring in both a clinical setting and a home environment. This result showed the potential of using sound-based models for diagnosing snoring, thereby offering more accessible and feasible diagnostic tools for home use. Moreover, our findings indicated that the model could enhance individuals' comprehension of their sleep, encouraging them to pursue necessary medical treatment and potentially mitigating long-term consequences such as hypertension and cardiovascular diseases.


K. Son
D. Kim
H. Park
S. Kim
S. Kim
D. Lee
M. Lee
S.H. Moon
I.-Y. Yoon
J.-W. Kim


1. I confirm that the abstract and that all information is correct.: Yes
2. I confirm that the abstract constitutes consent to publication.: Yes
3. I confirm that I submit this abstract on behalf of all authors.: Yes
I understand that the presenting author MUST register for the congress.: Yes