🍎APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

arxiv 2026

Jiwon Kang¹ Yeji Choi¹ JoungBin Lee¹ Wooseok Jang¹ Jinhyeok Choi¹
Taekeun Kang² Yongjae Park² Myungin Kim² Seungryong Kim¹

¹KAIST AI ²SAMSUNG

Paper Code BibTeX

TL;DR

We propose a diffusion-based teacher-student framework for face swapping, where teacher generates high-quality pseudo data that are then used to train a student model, significantly improving attribute preservation compared to existing diffusion-based methods while maintaining strong identity transfer.

Teaser. Our model, APPLE (Attribute-Preserving Pseudo-Labeling), successfully transfers the identity of a source (top left) onto a target (bottom left) while accurately preserving target attributes (e.g. pose, expression, skin tone, lighting) across ethnicity, input variations, and gender.

Overview

Face swapping aims to transfer the identity from a source image to a target face while preserving attributes like pose, lighting, and makeup. However, existing diffusion-based methods, which often rely on inpainting, struggle to preserve these attributes faithfully because masking the face removes crucial visual cues.

To address this, we propose APPLE, a teacher-student framework that generates high-quality pseudo-labels(triplets) to provide explicit supervision.

A teacher model is trained to produce attribute-preserving results without ground-truth data by:
- Reformulating face swapping as a conditional deblurring task to retain low-frequency attributes. (e.g. skin tone, lighting)
- Introducing an attribute-aware inversion scheme to capture fine-grained details. (e.g. makeup, accessories)
A student model is then trained on these high-quality pseudo-labels(triplets) produced by teacher model, learning face swapping as a direct editing task to further enhance attribute preservation.

Overall Framework

Overall architecture of the proposed method. We propose APPLE, a diffusion-based teacher-student framework that focuses on improving attribute-preservation. (a) To improve target-attribute preservation, we propose training a teacher with conditional deblurring rather than conditional inpainting widely used in existing works. (b) When constructing pseudo-triplet, we propose attribute-aware inversion which further improves attribute preservation in inference time. Note that inversion noise cannot be used during training due to its non-Gaussian property. (c) Student is trained with pseudo-triplet generated by teacher model. Thanks to pseudo-supervision, student even surpass the teacher, achieving state-of-the-art attribute preservation while maintaining high identity similarity.

Comparison of conditioning methods : Inpainting vs Deblurring

Compared to conditional inpainting widely used in existing works, proposed conditional deblurring strategy achieves largely improved attribute (e.g. pose, lighting) preservation of targets.

Comparison of conditioning configuration for inversion

(Top) Visualization of target-inverted noise and random gaussian noise via PCA. When attribute-condition is used, inverted noise encodes more semantic information compared to the others. (Bottom) Results of face swapping when each inverted noise used. Using attribute-only conditioned noise (attribute-aware inversion) yields most make-up preserved results without introducing artifacts.

Quantitative Results

Qualitative Results

Ablation Study

The inpainting method fails to preserve the target image’s attributes. Applying deblurring (4th column) and attribute-aware inversion (5th column) progressively better maintain the target attributes. Overall quality including attribute-preserving is maximized on student (6th column).

Citation

@article{kang2026apple,
  title={Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping},
  author={Kang, Jiwon and Choi, Yeji and Lee, JoungBin and Jang, Wooseok and Choi, Jinhyeok and Kang, Taekeun and Park, Yongjae and Kim, Myungin and Kim, Seungryong},
  journal={arXiv preprint},
  year={2026}
}