S4M: Boosting Semi-Supervised Instance Segmentation with SAM

1KAIST, 2Korea University, 3Samsung Electro-Mechanics
*Equal Contribution, Corresponding Author
ArXiv 2025

Abstract

Semi-supervised instance segmentation poses challenges due to limited labeled data, causing difficulties in accurately localizing distinct object instances. Current teacher-student frameworks still suffer from performance constraints due to unreliable pseudo-label quality stemming from limited labeled data. While the Segment Anything Model (SAM) offers robust segmentation capabilities at various granularities, directly applying SAM introduces challenges such as class-agnostic predictions and potential over-segmentation. To address these complexities, we carefully integrate SAM into the semi-supervised instance segmentation framework, developing a novel distillation method that effectively captures the precise localization capabilities of SAM without compromising semantic recognition. Furthermore, we incorporate pseudo-label refinement as well as a specialized data augmentation with the refined pseudo-labels, resulting in superior performance. We establish state-of-the-art performance, and provide comprehensive experiments and ablation studies to validate the effectiveness of our proposed approach.

Motivation

Motivation image

We begin by identifying the limitations of existing semi-supervised approaches through visual inspection of pseudo-labels produced by teacher networks. These pseudo-labels tend to correctly classify object categories but often fail in precise localization, commonly merging multiple instances into a single mask. Motivated by this observation, we carefully leverage the Segment Anything Model (SAM), not by naively adopting its outputs, but by selectively identifying what to learn and what not to learn. This enables us to effectively address both under- and over-segmentation issues within the semi-supervised instance segmentation framework.

Overall Framework

Method image

S4M effectively leverages SAM knowledge through three key approaches. First, we improve the teacher network through structural distillation, which distills SAM's inherent spatial understanding. Then, as the student learns from unlabeled images, we apply pseudo-label refinement based on SAM's strong segmentation capability, and further enhance training with instance-aware augmentation, ARP, which leverages the improved pseudo-labels.

Image 1
Structural Distillation
Pseudo-label Refinement
Augmentation with Refined Pseudo-label

Quantitative Results

Cityscapes Qualitative

Average Precision (AP) on Cityscapes under different label ratios with state-of-the-art methods.

COCO Qualitative

Average Precision (AP) on COCO under different label ratios with state-of-the-art methods.

Qualitative Results

Cityscapes
Qualitative comparison on Cityscapes dataset using 10% labeled data, comparing the baseline semi-supervised method GuidedDistillation (Berrada et al. 2024).
cityscapes_qual_10 `
Predictions from the supervised teacher (top) and our semi-supervised student (bottom) across different labeled data settings.
supple_city_quals
COCO
Qualitative comparison on COCO dataset using 2% labeled data, comparing the baseline semi-supervised method GuidedDistillation (Berrada et al. 2024).
COCO_qual_10
Predictions from the supervised teacher (top) and our semi-supervised student (bottom) across different labeled data settings.
COCO_qual_10

BibTeX

@misc{yoon2025s4mboostingsemisupervisedinstance,
      title={S^4M: Boosting Semi-Supervised Instance Segmentation with SAM},
      author={Heeji Yoon and Heeseong Shin and Eunbeen Hong and Hyunwook Choi and Hansang Cho and Daun Jeong and Seungryong Kim},
      year={2025},
      eprint={2504.05301},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.05301},
}