Latent Space Editing in Transformer-based Flow Matching

University of Amsterdam, Alibaba, UC Merced
AAAI 2024
Also in ICML 2023 Workshop, New Frontiers in Learning, Control, and Dynamical Systems

Abstract

This paper strives for image editing via generative models. Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training. Simultaneously, a new transformer-based U-ViT has recently been proposed to replace the commonly used UNet for better scalability and performance in generative modeling. Hence, Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling, but their latent structure and editing ability are as of yet unknown. Hence, we adopt this setting and explore how to edit images through latent space manipulation. We introduce an editing space, which we call $u$-space, that can be manipulated in a controllable, accumulative, and composable manner. Additionally, we propose a tailored sampling solution to enable sampling with the more efficient adaptive step-size ODE solvers. Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts. Our framework is simple and efficient, all while being highly effective at editing images while preserving the essence of the original content.

@inproceedings{hulfm, title = {Latent Space Editing in Transformer-based Flow Matching}, author = {Hu, Tao and Zhang, David W and Mettes, Pascal and Tang, Meng and Zhao, Deli and Snoek, Cees G.M.}, year = {2024}, booktitle = {AAAI}, }

Latent Space Editing in Transformer-based Flow Matching

Aliquam vitae elit ullamcorper tellus egestas pellentesque. Ut lacus tellus, maximus vel lectus at, placerat pretium mi. Maecenas dignissim tincidunt vestibulum. Sed consequat hendrerit nisl ut maximus.

Abstract

First image description.

Second image description.

Third image description.

Fourth image description.

Video Presentation

Another Carousel

Poster

BibTeX