🍎APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

arxiv 2026

Jiwon Kang1 Yeji Choi1 JoungBin Lee1 Wooseok Jang1 Jinhyeok Choi1
Taekeun Kang2 Yongjae Park2 Myungin Kim2 Seungryong Kim1
1KAIST AI 2SAMSUNG

TL;DR

We propose a diffusion-based teacher-student framework for face swapping, where teacher generates high-quality pseudo data that are then used to train a student model, significantly improving attribute preservation compared to existing diffusion-based methods while maintaining strong identity transfer.



Teaser

Teaser. Our model, APPLE (Attribute-Preserving Pseudo-Labeling), successfully transfers the identity of a source (top left) onto a target (bottom left) while accurately preserving target attributes (e.g. pose, expression, skin tone, lighting) across ethnicity, input variations, and gender.

Overview

Face swapping aims to transfer the identity from a source image to a target face while preserving attributes like pose, lighting, and makeup. However, existing diffusion-based methods, which often rely on inpainting, struggle to preserve these attributes faithfully because masking the face removes crucial visual cues.

To address this, we propose APPLE, a teacher-student framework that generates high-quality pseudo-labels(triplets) to provide explicit supervision.

Overall Framework

Framework Overview

Overall architecture of the proposed method. We propose APPLE, a diffusion-based teacher-student framework that focuses on improving attribute-preservation. (a) To improve target-attribute preservation, we propose training a teacher with conditional deblurring rather than conditional inpainting widely used in existing works. (b) When constructing pseudo-triplet, we propose attribute-aware inversion which further improves attribute preservation in inference time. Note that inversion noise cannot be used during training due to its non-Gaussian property. (c) Student is trained with pseudo-triplet generated by teacher model. Thanks to pseudo-supervision, student even surpass the teacher, achieving state-of-the-art attribute preservation while maintaining high identity similarity.

Comparison of conditioning methods : Inpainting vs Deblurring

Comparison of conditioning methods

Compared to conditional inpainting widely used in existing works, proposed conditional deblurring strategy achieves largely improved attribute (e.g. pose, lighting) preservation of targets.

Comparison of conditioning configuration for inversion

Comparison of conditioning configuration for inversion

(Top) Visualization of target-inverted noise and random gaussian noise via PCA. When attribute-condition is used, inverted noise encodes more semantic information compared to the others. (Bottom) Results of face swapping when each inverted noise used. Using attribute-only conditioned noise (attribute-aware inversion) yields most make-up preserved results without introducing artifacts.

Quantitative Results

Qualitative Results

Ablation Study

Ablation Study

The inpainting method fails to preserve the target image’s attributes. Applying deblurring (4th column) and attribute-aware inversion (5th column) progressively better maintain the target attributes. Overall quality including attribute-preserving is maximized on student (6th column).

Citation

@article{kang2026apple,
  title={Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping},
  author={Kang, Jiwon and Choi, Yeji and Lee, JoungBin and Jang, Wooseok and Choi, Jinhyeok and Kang, Taekeun and Park, Yongjae and Kim, Myungin and Kim, Seungryong},
  journal={arXiv preprint},
  year={2026}
}