POLAR: A Portrait OLAT Dataset and Generative Framework for Illumination-Aware Face Modeling

Zhuo Chen1,2†, Chengqun Yang1†, Zhuo Su2*, Zheng Lv2, Jingnan Gao1, Xiaoyuan Zhang2, Xiaokang Yang1, Yichao Yan1*
1MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, 2PICO.
( † denotes equal contribution, * denotes corresponding author. )

POLAR captures high-resolution OLAT facial data with diverse subjects and expressions, from which we synthesize large-scale HDR-relit portraits. POLARNet further learns to generate per-light OLAT responses from a single portrait, enabling scalable and physically consistent relighting under arbitrary HDR environments.

Abstract

Interpolate start reference image.

Face relighting aims to synthesize realistic portraits under novel illumination while preserving identity and geometry. However, progress remains constrained by the limited availability of large-scale, physically consistent illumination data.

To address this, we introduce POLAR, a large-scale and physically calibrated One-Light-at-a-Time (OLAT) dataset containing over 200 subjects captured under 156 lighting directions, multiple views, and diverse expressions. Building upon POLAR, we develop a flow-based generative model POLARNet that predicts per-light OLAT responses from a single portrait, capturing fine-grained and direction-aware illumination effects while preserving facial identity.

Unlike diffusion or background-conditioned methods that rely on statistical or contextual cues, our formulation models illumination as a continuous, physically interpretable transformation between lighting states, enabling scalable and controllable relighting. Together, POLAR and POLARNet form a unified illumination learning framework that links real data, generative synthesis, and physically grounded relighting, establishing a self-sustaining “chicken-and-egg’’ cycle for scalable and reproducible portrait illumination.

Method

Interpolate start reference image.

Given a uniformly lit portrait, the encoder–decoder pair \( (\mathbf{E},\mathbf{D}) \) maps both the input and its target OLAT image into latent space. Latent Bridge Matching learns a continuous, direction-conditioned transport between these endpoints, supervised by the velocity field loss \( \mathcal{L}_{\mathrm{LBM}} \). A conditional U-Net predicts the latent drift \( v{\theta}(z_t, t, c_{\text{dir}}) \) using the encoded light direction as input. During inference, a single forward step transports the latent \( z_u \) toward the illumination-specific latent \( z_l \), enabling efficient generation of per-light OLAT responses for all calibrated directions. These synthesized OLATs can be linearly composed to render realistic relighting under arbitrary HDR environments.