magic starSummarize by Aili

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

๐ŸŒˆ Abstract

The paper proposes a training-free approach to flexibly personalize rectified flow models using anchored classifier guidance. It extends the applicability of the original classifier guidance by transforming it into a new fixed-point formulation that can leverage off-the-shelf image discriminators, without relying on a special noise-aware classifier. To improve the stability of this fixed-point solution, the paper introduces an anchored classifier guidance that constrains the target flow trajectory to be close to a reference trajectory, providing a theoretical convergence guarantee. The derived method is implemented on a practical class of piecewise rectified flow and demonstrates advantageous results in various personalization tasks for human faces, live subjects, certain objects, and multiple subjects.

๐Ÿ™‹ Q&A

[01] Classifier Guidance for Rectified Flow

1. What is the key observation that allows bypassing the need for a noise-aware classifier? The key observation is that by approximating the rectified flow trajectory to be ideally straight, the original classifier guidance can be reformulated as a simple fixed-point problem involving only the trajectory endpoints, without requiring the noise-aware classifier.

2. What is the limitation of the initial fixed-point solution derived based on this observation? The initial fixed-point solution may not always converge, as even a small perturbation at the starting point could lead to the target flow trajectory diverging significantly after iterative updates, hindering the controllability of the rectified flow.

3. How does the paper address this limitation? To improve the stability, the paper proposes a new "anchored classifier guidance" that constrains the target flow trajectory to be close to a predetermined reference trajectory. This provides a better convergence guarantee and a certain degree of interpretability.

[02] Anchored Classifier Guidance

1. What is the key idea behind the anchored classifier guidance? The key idea is to constrain the target flow trajectory to be straight and near a reference trajectory, by anchoring the target velocity to the reference velocity. This helps stabilize the solving process of the target trajectory.

2. How does the anchored classifier guidance bypass the need for a noise-aware classifier? Similar to the initial fixed-point solution, the anchored classifier guidance substitutes the intermediate classifier guidance terms with an expression involving only the trajectory endpoints, allowing the use of off-the-shelf image discriminators.

3. What theoretical property does the anchored classifier guidance exhibit? The paper shows that the fixed-point iteration to solve the anchored classifier guidance exhibits at least linear convergence, provided that the image discriminator is Lipschitz continuous, by properly choosing the guidance scale.

[03] Practical Algorithm

1. How does the paper extend the analysis to handle practical rectified flow models? The paper relaxes the assumption of an ideally straight rectified flow trajectory, and instead adopts a piecewise linear approximation, where the flow trajectory is assumed straight within each time window.

2. How does the paper address the issue of disconnected reference trajectory segments after updates? To handle the disconnected reference trajectory segments, the paper proposes to reinitialize the reference trajectory every iteration with predictions for the updated target starting points.

3. What are the key steps in the iterative procedure to solve the target flow trajectory under the anchored classifier guidance? The key steps are: 1) Predict the updated target starting points by extrapolating from history updates; 2) Solve the derived fixed-point problem to obtain the new target trajectory, anchored to the reinitialized reference trajectory.

[04] Applications

1. What types of personalization tasks does the proposed method cover? The proposed method is flexible for various personalized image generation tasks, including human faces, live subjects (e.g. cats, dogs), certain objects (e.g. cans, vases), and even multiple subjects.

2. How does the method leverage off-the-shelf image discriminators for these tasks? For face-centric personalization, the method uses a face specialist discriminator (ArcFace). For subject-driven generation, it employs an open-vocabulary object detector (OWL-ViT) and a self-supervised backbone (DINOv2) to extract visual features.

3. How does the method handle the multi-subject scenario? The method extends to the multi-subject case by incorporating a bipartite matching step to associate the generated subjects with the reference subjects, before computing the classifier guidance signal.


Shared by Daniel Chen ยท
ยฉ 2024 NewMotor Inc.