arXiv | Conference | BibTeX
Official Implementation of Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification.
See you in ICASSP 2024!
June-Woo Kim,
Sangmin Bae,
Won-Yang Cho,
Byungjo Lee,
Ho-Young Jung
- We demonstrated that addressing the domain inconsistency challenges by introducing domain adversarial training techniques.
- We introduced a novel stethoscope-guided supervised contrastive learning (SG-SCL) approach for cross-domain adaptation with various pretrained architectures.
- The proposed method forces the model to reduce the distribution shift between different stethoscope classes while maintaining equivalence in the same class.
Install the necessary packages with:
$ pip install torch torchvision torchaudio
$ pip install -r requirements.txt
For the reproducibility, we used torch=2.0.7
and torchaudio=2.0.
Download the ICBHI dataset files from official_page.
$ wget https://bhichallenge.med.auth.gr/sites/default/files/ICBHI_final_database/ICBHI_final_database.zip
All *.wav
and *.txt
should be saved in data/icbhi_dataset/audio_test_data
.
Note that ICBHI dataset consists of a total of 6,898 respiratory cycles, of which 1,864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.
To simply train the model, run the shell files in scripts/
.
scripts/icbhi_ce.sh
: Cross-Entropy loss with AST model.scripts/icbhi_dat_device.sh
: Cross-Entropy loss with Domain Adaptation (DANN) in terms of Device (stethoscope) with AST Model.scripts/icbhi_sg_scl.sh
: Cross-Entropy loss with SG-SCL (Stethoscope-guided Supervised Contrastive Learning) with AST model.
Important arguments for different data settings.
--dataset
: other lungsound datasets or heart sound can be implemented.--class_split
: "lungsound" or "diagnosis" classification.--n_cls
: set number of classes as 4 or 2 (normal / abnormal) for lungsound classification.--test_fold
: "official" denotes 60/40% train/test split, and "0"~"4" denote 80/20% split.--domain_adaptation
: Using the proposedDAT
in this paper.--domain_adaptation2
: Using the proposedSCL
in this paper.--meta_mode
: meta information for cross-domain; choices=['none', 'age', 'sex', 'loc', 'dev', 'label']
. The default isdev
.
Important arguments for models.
--model
: network architecture, see models.--from_sl_official
: load ImageNet pretrained checkpoint.--audioset_pretrained
: load AudioSet pretrained checkpoint and only support AST and SSAST.
Important argument for evaluation.
--eval
: switch mode to evaluation without any training.--pretrained
: load pretrained checkpoint and requirepretrained_ckpt
argument.--pretrained_ckpt
: path for the pretrained checkpoint.
The pretrained model checkpoints will be saved at save/[EXP_NAME]/best.pth
.
The proposed Stethoscope-Guided Supervised Contrastive Learning achieves a 61.71% Score, which is a significant improvement of 2.16% over the baseline.
To get the t-sne results, run the shell files in scripts
.
scripts/get_tsne.sh
: get the t-sne from pretrained weights. You must type the pretrained weight as below:
--eval
--pretrained
--pretrained_ckpt
: load pretraind weights. Please type your own model weight (e.g.,/home/junewoo/stethoscope-guided_supervised_contrastive_learning/save/da2/icbhi_ast_ce_dev_sg_scl_bs8_lr5e-5_ep50_seed${s}_best_param/best.pth
).
If you find this repo useful for your research, please consider citing our paper:
@inproceedings{kim2024stethoscope,
title={Stethoscope-Guided Supervised Contrastive Learning for Cross-Domain Adaptation on Respiratory Sound Classification},
author={Kim, June-Woo and Bae, Sangmin and Cho, Won-Yang and Lee, Byungjo and Jung, Ho-Young},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1431--1435},
year={2024},
organization={IEEE}
}