Overview
Abstract
Person re-identification (Re-ID) often faces challenges due to variations in human pose and camera viewpoint, which significantly affects the appearance variations of individuals across images. Existing datasets often lack diversity and scalability of these human pose and camera viewpoint, hindering the generalization of Re-ID models to new camera networks. Previous methods have attempted to address these issues by using data augmentation, but they rely on poses already present in the dataset, failing to effectively reduce the pose bias in the dataset. In this paper, we propose Diff-ID, a novel approach that augments training data with sparse and limited poses that are underrepresented in the original distribution. By leveraging the knowledge of pre-trained large-scale generative models like Stable Diffusion, we successfully generate realistic images with diverse human poses and camera viewpoints. Specifically, our objective is to create a training dataset that enables existing Re-ID models to learn features debiased to pose variations. Qualitative results demonstrate the effectiveness of our method in addressing pose bias and enhancing the generalizability of Re-ID models compared to other approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models.
Main Architecture
The Effect of Viewpoint and Human Pose Augmentation
Qualitative Results
GAN-based Models
Quantitative comparison on standard Re-ID benchmarks.
Note that the Re-ID Experts in the first row group are not directly comparable, as our primary focus is on dataset generation.
For augmentation-based methods, we train the same Re-ID model on the datasets generated by each
method to ensure a fair comparison.
*: The authors did not provide a pre-trained model.
Citation
title={Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification},
author={Inès Hyeonsu Kim and JoungBin Lee and Soowon Son and Woojeong Jin and Kyusun Cho and Junyoung Seo and Min-Seop Kwak and Seokju Cho and JeongYeol Baek and Byeongwon Lee and Seungryong Kim},
year={2024},
eprint={2406.16042},
archivePrefix={arXiv},
primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
Acknowledgements
The website template was borrowed from Michaël Gharbi.