Official Implementation of: Transformers or Convolutions? Benchmarking Models for Interactive Segmentation of Neurofibromas in Whole-Body MRI
Authors: Georgii Kolokolnikov, Marie-Lena Schmalhofer, Lennart Well, Inka Ristow, and René Werner
University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
The paper has been submitted to the Artificial Intelligence in Medicine (AIME) 2025 short-paper track: AIME 2025 Conference
The repository benchmarks various convolution- and transformer-based interactive segmentation models on whole-body MRI data of Neurofibromatosis Type 1 (NF1) patients.
We introduce a unified evaluation pipeline that enables automated assessment of interactive segmentation models under two interaction scenarios:
-
Lesion-wise Interaction:
- A subset of 20 largest lesions is selected.
- Each lesion is refined separately using model predictions.
- Interactions continue until a threshold per-lesion Dice Similarity Coefficient (DSC) is reached or the maximum number of interactions per lesion is exhausted.
-
Global Scan-wise Interaction:
- The entire scan is refined iteratively.
- Each correction targets the largest segmentation error.
- The model is evaluated based on its ability to improve segmentation accuracy globally.
Below is an illustration of the lesion-wise interaction scenario:
This repository evaluates and benchmarks the following interactive segmentation models:
- DINs (Deep Interactive Networks) - CNN-based model incorporating user feedback.
- SW-FastEdit (Sliding Window FastEdit) - CNN-based model leveraging a sliding window strategy.
- SimpleClick + STCN (STILL UNDER DEVELOPMENT) - Transformer-based interactive segmentation model operating in 2D + segmentation propagation model.
- SAM2 (Segment Anything Model 2) - Transformer-based model extending 2D segmentation into videos. We applied its frame-wise processing to slices of 3D whole-body MRI along the anterior-posterior axis.
The figure below presents examples of neurofibroma segmentation predicted by different fine-tuned models in the lesion-wise interaction scenario (3 interactions per 20 largest lesions) on two cases:
- Left three images → High tumor burden case
- Right three images → Low tumor burden case
Color coding for segmentation performance:
- True Positives → Yellow
- False Positives → Red
- False Negatives → Green
git clone https://github.com/IPMI-ICNS-UKE/NFInteractiveSegmentationBenchmarking.git
cd NFInteractiveSegmentationBenchmarking
Click here to see an overview of the repository structure
.
├── data
│ ├── processed
│ │ └── README.MD
│ ├── raw
│ │ └── README.MD
│ └── splits
│ ├── fold_1
│ │ ├── train_set.txt
│ │ └── val_set.txt
│ └── ...
├── data_scripts
│ ├── convert_3d_mri_to_2d_slices.py
│ ├── split_into_train_val_sets.py
│ └── checking_data.ipynb
├── environment_tf.yml
├── environment_torch.yml
├── evaluation
│ ├── experiment_launchers
│ ├── results
│ ├── README.MD
│ └── ...
├── experiments
├── launchers
│ ├── finetune_dins.sh
│ ├── finetune_sam2.sh
│ ├── finetune_simpleclick.sh
│ ├── finetune_stcn.sh
│ ├── finetune_sw_fastedit.sh
│ └── ...
├── model_code
│ ├── DINs_Neurofibroma
│ ├── iSegFormer_Neurofibroma
│ ├── SimpleClick_Neurofibroma
│ ├── sam2_Neurofibroma
│ └── SW_FastEdit_Neurofibroma
├── model_weights
│ ├── convert_tf_to_onnx.py
│ ├── export_sam2_model_weights.py
│ └── README.MD
├── model_weights_finetuned
│ ├── test_onnxruntime.py
│ └── README.MD
├── results
│ ├── result_analysis.ipynb
│ └── illustration_generation.ipynb
├── README.md
└── ...
There are two environments:
- PyTorch-based (for evaluation pipeline and training of most models, including SAM2, SimpleClick, and SW-FastEdit)
- TensorFlow-based (for training and model weights exporting of DINs)
conda env create -f environment_torch.yml
conda activate nf_iseg_benchmark_torch
Then install SAM2:
cd model_code/sam2_Neurofibroma
pip install -e .
pip install -e ".[dev]"
Install additional packages:
pip install git+https://github.com/cheind/py-thin-plate-spline
conda env create -f environment_tf.yml
conda activate nf_iseg_benchmark_tf
pip install git+https://github.com/IDSIA/[email protected]
The repository was developed for and tested with 3D whole-body MRI scans in .nii.gz
format. Potentially, it can also work with other 3D data in .nii.gz format, however some adjustments could be required.
-
Place the original dataset in
data/raw/
, following the structure:imagesTr/
labelsTr_instance/
imagesTs/
labelsTs_instance/
Refer to
data/raw/README.md
for further details. -
Since transformer-based models were originally designed for 2D images, 3D images need to be sliced using:
python data_scripts/convert_3d_mri_to_2d_slices.py
-
To split data into training and validation sets:
python data_scripts/split_into_train_val_sets.py
Pre-trained models should be downloaded and placed in model_weights/
following the structure in model_weights/README.md
.
Download links:
- DINs: DINs Repository
- SW-FastEdit: SW-FastEdit Repository
- SimpleClick + STCN:
- SAM2 (SAM2.1 Hiera Base Plus): SAM2 Repository
Once pre-trained weights and data are set up, models can be trained/fine-tuned using the bash scripts in launchers/
.
Training/Fine-Tuning Scripts:
bash launchers/finetune_dins.sh
bash launchers/finetune_sam2.sh
bash launchers/train_sw_fastedit.sh
The progress, log files, and artifacts generated during training are dumped to experiments/
.
In case of the DINs model, use the following script to export it from TensorFlow to ONNX (requires the TensorFlow environment nf_iseg_benchmark_tf
):
python model_weights/convert_tf_onnx.py
In case of the SAM2 model, please export the model weights from the checkpoint:
python model_weights/export_sam2_model_weights.py
Once a model is trained, the final model weights should be placed to model_weights_finetuned/
, according to model_weights_finetuned/README.md
.
To evaluate trained models, navigate to evaluation/experiment_launchers/
and execute the respective bash scripts inside, for example:
cd evaluation/experiment_launchers
bash launch_SAM2_lesion_wise_corrective.sh
The evaluation results (metrics, predictions) are saved in evaluation/results/
.
Results can be analyzed using Jupyter notebooks in results/
:
cd results
jupyter notebook result_analysis.ipynb
Data: Highly anisotropic T2-weighted fat-suppressed coronal WB-MRI (3T) with voxel spacing of 0.625 mm x 0.625 mm x 7.8 mm in NIFTI format acquired with Siemens Magnetom scanner (Siemens Healthineers, Erlangen, Germany).
Hardware:
- Machine 1: 64-bit Ubuntu 22.04.5 LTS with an AMD Ryzen Threadripper Pro 3975WX CPU and an NVIDIA RTX A6000 GPU
- Machine 2: 64-bit Ubuntu 22.04.4 LTS with an AMD Ryzen 9 7950X3D CPU and an NVIDIA GeForce RTX 4090 GPU
- Ensure all paths are correctly configured according to your local setup.
- This repository was tested with 3D whole-body MRI (.nii.gz) images. Other 3D medical imaging data may require spatial adjustments.
- Any modifications to training configurations can be made in the bash scripts inside
launchers/
. - Any modifications to evaluation configurations can be made in the bash scripts inside
evaluation/experiment_launchers
.
We would like to thank all the authors of the following repositories and respective research papers that were instrumental in our work:
- DINs: Deep Interactive Networks for Neurofibroma Segmentation in Neurofibromatosis Type 1 on Whole-Body MRI; Zhang et al. (2022), IEEE Journal of Biomedical and Health Informatics
- Sliding Window Fastedit: A Framework for Lesion Annotation in Whole-Body Pet Images; Hadlich et al. (2024), IEEE ISBI
- SimpleClick: Interactive Image Segmentation with Simple Vision Transformers; Liu et al. (2023), ICCV
- Exploring Cycle Consistency Learning in Interactive Volume Segmentation; Liu et al. (2023), arXiv preprint
- SAM 2: Segment Anything in Images and Videos; Ravi et al. (2024), arXiv preprint
For questions, feedback, or collaboration inquiries, please contact:
For technical issues or feature requests, please open an issue in this repository’s Issues section.