Overview
The Segment Anything Model (SAM) is a state-of-the-art, promptable deep learning model that excels in 2D natural image segmentation. However, medical imaging, especially brain tumour segmentation, presents unique challenges due to the complexity of tumour boundaries and the multidimensionality of data. This project aims to enhance SAM's capability for brain tumour segmentation through Parameter-Efficient Fine-Tuning (PEFT) on the BraTS Intracranial Meningioma 2023 dataset. In addition, we introduce two framework modifications to augment SAM's performance: U-SAM and SAM-LoRA.
Benchmarking: Vanilla and PEFT-SAM
Our project began with establishing a strong foundation using the baseline model, Vanilla SAM, which delivered commendable results for tumour segmentation. Starting with 84.1% accuracy, Vanilla SAM demonstrated its capability in identifying tumour regions effectively.

A lot of our design decisions were justified by existing efforts in the Medical SAM community, such as using bounding boxes and slicing the 3D MRI volume into 2D slices to be processed by SAM. We also performed various preprocessing steps standard in medical imaging, such as Z-Score normalisation and rescaling, to ensure standardized intensities across samples, and especially across the four scan-types per sample, as seen above. These four scan-types are the same anatomical regions captured through different MRI protocols, each offering unique tissue contrasts crucial for enhancing the segmentation process. This comprehensive imaging approach allows the model to extract detailed features from various perspectives, improving tumour delineation and overall segmentation accuracy.
One of the notable preprocessing steps we performed was converting the dataset's original four-class system
to a simpler binary tumour segmentation. In this new system, each pixel is either classified as part of the
tumour or not in the model's prediction. This approach was chosen because the complexities involved in multi-class
tumour segmentation were outside the scope of this project due to the medical nuances. By simplifying the classification,
we were able to streamline the segmentation process with SAM, focusing exclusively on tumour presence versus absence,
which aligned with our project goals.
For our PEFT-SAM (Parameter-Efficient Fine-Tuning) approach, PEFT refers to training only the mask decoder while freezing all other SAM modules, a common strategy from our research. To train PEFT-SAM on the BraTS data, we used a Weighted Combination of Mean Squared Error and Cross Entropy loss functions, along with the Adam Optimizer. Additionally, we applied Optuna hyperparameter tuning to optimize the number of epochs, samples per epoch, and other parameters, alongside an exponentially decaying learning rate schedule. Our training allowed us to expose SAM to the intricacies of the brain MRIs and the complexity of the tumours, ultimately improving the accuracy to 87.7% — a notable increase of 3.6% from our Vanilla SAM. The graph below shows the progression of our PEFT-SAM training across 125 epochs, illustrating the relationship between Training Loss (blue line), Validation Loss (orange bars), and Validation Dice (green markers), which is the performance on the Validation Set:

The below grid visualises the progression from Vanilla SAM to our best version of PEFT-SAM across six sample cases. Each animation represents the axial slice with the largest cross-sectional area of the tumour, highlighting how our training has impacted SAM's predictions. Green indicates true positives, red marks false positives, and blue shows false negatives. While further fine-tuning of PEFT-SAM could have continued to reduce false negatives and false positives, our focus shifted toward framework modifications for this project. It is evident that while the predictions are still not at the clinical-grade accuracy needed, our PEFT-SAM training was successful in substantially reducing the false positives, which were a significant issue in Vanilla SAM's predictions.






A comparison of the final 3D masks generated by our PEFT model and Vanilla SAM is shown below. The green regions represent the ground truth masks while model predictions are shown in red.


PEFT-SAM builds upon the strong foundation laid by Vanilla SAM, focusing on the key challenge of tumour segmentation and refining the model's interaction with prompts. This approach provided a flexible yet precise method, setting the stage for more complex architectural modifications with our proposed models, U-SAM and LoRA. PEFT-SAM, along with Vanilla SAM, will serve as the baseline for comparison against these modified models.
✧ U-SAM ✧
A hybrid model that integrates SAM with U-Net's superior 3D spatial capabilities for improved tumour boundary precision.
Learn More✧ SAM-LoRA ✧
An investigation on applying LoRA to SAM's image encoder affects segmentation accuracy.
Learn More✦ Deliverables ✦
-
U-SAM
Cassandra Wallace
-
SAM-LoRA
Tapera Chikumbu