Enhancing both Sleep Stage Classification and Obstructive Sleep Apnea Event Detection tasks with a unified sound-based multi-task model

Oct 20, 2023
World Sleep 2023


Sleep Stage Classification and Obstructive Sleep Apnea (OSA) Event Detection are two essential tasks in sleep analysis. These tasks have been approached using separate deep learning models, despite their strong correlation. In this study, we propose a unified sound-based multi-task AI model that enhances the performance of both tasks by leveraging shared information and reducing computation costs.

Materials and Methods:
Our dataset comprises 2,048 recorded sleep sessions obtained from a university hospital, which were annotated using polysomnography (PSG). To facilitate analysis, we converted the audio data into mel-spectrograms representing 30-second epochs.
The proposed deep neural network model consisted of two components: a feature extractor and a transformer encoder. The feature extractor was responsible for capturing essential features required for classifying breathing patterns, and the extracted features were then fed into the transformer encoder. This transformer encoder, known for its effectiveness in handling sequential data, leveraged the relationships among consecutive epochs of the respiratory sound. In the multi-task model, both the sleep stage classification and OSA event detection tasks shared the same feature extractor, enabling them to learn valuable information from identical mel-spectrograms. However, to address the unique nuances and patterns relevant to each task, two separate transformer modules were employed as individual heads. To train the multi-task model, we initially trained two distinct teacher models individually, with each model dedicated to a specific task. Subsequently, the multi-task model was trained using both the observed labels from the PSG and the features generated by the well-trained single-task teacher models. This approach, known as distillation learning, allowed the multi-task model to learn from the collective knowledge of the teacher models and benefit from their specialized features.

Our multi-task model exhibited superior performance in both tasks compared to the separately trained teachers. For the 3-class OSA detection task, we achieved a macro F1 score of 0.635, surpassing the teacher model's score of 0.633. Moreover, the sleep stage classification task demonstrated significant improvement, with a macro F1 score of 0.690 for the 4-class sleep stage classification task, compared to the teacher model's score of 0.683. Our model also achieved high agreement with the PSG golden standards, with 87.4% on OSA detection, and 71.0% on Sleep Stage classification.

By employing a unified sound-based multi-task model, we successfully enhanced the performance of both OSA detection and sleep stage classification tasks. Additionally, we achieved a reduction in the number of parameters in our model by nearly 50%, saving half the cost of storing and executing the tasks, thus making our solution more accessible to public healthcare. Further studies can explore the potential of this approach in other sleep-related tasks and datasets.


V.L. Le
J. Kim
E. Cho
D. Lee
J. Hong
D. Kim
M. Lee
S.H. Moon
C. Kushida3
J.-W. Kim


1. I confirm that the abstract and that all information is correct.: Yes
2. I confirm that the abstract constitutes consent to publication.: Yes
3. I confirm that I submit this abstract on behalf of all authors.: Yes
I understand that the presenting author MUST register for the congress.: Yes