DA-Flow: Degradation-Aware
Optical Flow Estimation with Diffusion Models

arXiv 2026

Jaewon Min¹ Jaeeun Lee¹ Yeji Choi¹ Paul Hyunbin Cho¹ Jin Hyeon Kim¹ Tae-Young Lee² Jongsik Ahn² Hwayeong Lee² Seonghyun Park² Seungryong Kim^1†

¹KAIST AI ²Hanwha Systems

†: Corresponding author

Paper arXiv Code (Coming Soon) BibTeX

TL;DR

DA-Flow lifts a pretrained image restoration diffusion model with full spatio-temporal attention, then fuses diffusion features for degradation-aware optical flow estimation.

DA-Flow accurately recovers motion from severely degraded video inputs where existing optical flow methods (RAFT, SEA-RAFT, FlowSeek) produce noisy, inconsistent flow fields. By leveraging degradation-aware features from a lifted image restoration diffusion model, DA-Flow substantially outperforms baselines on multiple benchmarks under severe corruption.

Abstract

Optical flow models trained on high-quality data degrade severely when confronted with real-world corruptions such as blur, noise, and compression artifacts. We formulate Degradation-Aware Optical Flow, a new task targeting accurate dense correspondence estimation from corrupted video. Our key insight is that intermediate representations of image restoration diffusion models are inherently corruption-aware. We lift such a model to attend to multiple frames through full spatio-temporal attention and show that the resulting features exhibit zero-shot correspondence capabilities. Based on this finding, we present DA-Flow, a hybrid architecture that fuses diffusion features with convolutional features within an iterative refinement framework. DA-Flow substantially outperforms existing optical flow methods under severe degradation across multiple benchmarks.

Analysis

We investigate whether pretrained image restoration diffusion models can provide degradation-aware features for geometric correspondence. By lifting the DiT4SR model with full spatio-temporal attention over multi-frame inputs, we analyze the zero-shot correspondence quality of diffusion features under severe degradation.

Comparison of layer-wise average EPE over timesteps. Across nearly all feature indices, the lifted features exhibit substantially lower EPE. This demonstrates that lifting enhances correspondence capability by enabling the model to learn cross-frame information.

Comparison of zero-shot geometric correspondence between Baseline and Lifting features. (Left) Top-10 layers ranked by timestep-averaged EPE (lower is better). Lifting consistently achieves lower EPE across all ranks. (Right) EPE over denoising steps for the top-4 layers of each method. Baseline features show high sensitivity to the denoising step, while Lifting features remain stable across the denoising steps.

DA-Flow

Overall Architecture of DA-Flow. DA-Flow retains the standard correlation operator and iterative update operator from RAFT, while incorporating the lifted diffusion model alongside a conventional CNN feature encoder. Given a pair of degraded input frames, the lifted diffusion model extracts query and key features from full spatio-temporal MM-Attention layers. Features from the top-\(L\) layers with strongest correspondence are aggregated via DPT-based upsampling with three separate heads (query, key, context), then concatenated with CNN encoder features to form hybrid representations for cost volume construction and iterative flow refinement.

DA-Flow: Degradation-Aware
Optical Flow Estimation with Diffusion Models

Abstract

Analysis

DA-Flow

Quantitative Results

Qualitative Results

Sintel

Spring

TartanAir

Video Restoration (YouHQ40)

Citation

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Abstract

Analysis

DA-Flow

Quantitative Results

Qualitative Results

Sintel

Spring

TartanAir

Video Restoration (YouHQ40)

Citation

DA-Flow: Degradation-Aware
Optical Flow Estimation with Diffusion Models