Tao Hu

Computer Vision & Learning Group

Akademiestr. 7,Munich

Ludwig Maximilian University of Munich

I am a Postdoctoral Research Fellow with Björn Ommer in Ommer-Lab ( Stable Diffusion Lab ), focused on exploring the scalability and generalization ablity of diffusion model in the nxtaim project. I finished my PhD at VISLab, University of Amsterdam.

I am recruiting for Bachelor, Master and PhD supervision in Munich and globally. If you're interested in collaborating, feel free to send an email.

Open to discussion and collaboration, feel free to send an email.

Focused on introducing inductive bias into neural network to achieve data-efficiency by few-shot learning, generative model, etc. Have a conviction that generative modelling will be the future of discriminative modelling.

news

Aug 01, 2025	TREAD and ArtFlow are accepted by ICCV 2025, Congrats to the team~
Mar 11, 2025	MaskFlow on arxiv
Mar 07, 2025	Will give a talk at NxtAim Winter School about “Efficient Architecture for Representation”.
Mar 01, 2025	Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions accepted by CVPR 2025.
Jan 23, 2025	ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge accepted by ICLR 2025.

selected publications

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

Felix Krause , Timy Phan , Ming Gui , Stefan Andreas Baumann , Vincent Tao Hu , and Björn Ommer

In ICCV , 2025

PDF Code Website
Stochastic Interpolants for Revealing Stylistic Flows across the History of Art

Pingchuan Ma , Ming Gui , Johannes Schusterbauer , Xiaopei Yang , Olga Grebenkova , Vincent Tao Hu , and Björn Ommer

In ICCV , 2025

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Stefan Andreas Baumann , Felix Krause , Michael Neumayr , Nick Stracke , Vincent Tao Hu , and Björn Ommer

In CVPR , 2025

Prompt Editing in T2I models

Bib PDF Code Website

@inproceedings{baumann2024attributecontrol,
  title = {{C}ontinuous, {S}ubject-{S}pecific {A}ttribute {C}ontrol in {T}2{I} {M}odels by {I}dentifying {S}emantic {D}irections},
  author = {Baumann, Stefan Andreas and Krause, Felix and Neumayr, Michael and Stracke, Nick and Hu, Vincent Tao and Ommer, Björn},
  year = {2025},
  note = {Prompt Editing in T2I models},
  booktitle = {CVPR},
  repostar = {CompVis/attribute-control}
}

MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation

Michael Fuest , Vincent Tao Hu , and Björn Ommer

In Arxiv , 2025

Bib PDF Code Website

@inproceedings{maskflow,
  title = {MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation},
  author = {Fuest, Michael and Hu, Vincent Tao and Ommer, Björn},
  year = {2025},
  booktitle = {Arxiv},
  preview1111 = {./mask.jpg},
  repostar = {CompVis/maskflow}
}

[MASK] is All You Need

Vincent Tao Hu , and Björn Ommer

In Arxiv , 2024

Bib PDF Code Website

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

Eslam Mohamed BAKR , Liangbing Zhao , Vincent Tao Hu , Matthieu Cord , Patrick Perez , and Mohamed Elhoseiny

In ICLR , 2025

Bib PDF Website

@inproceedings{ToddlerDiffusion,
  title = {ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model},
  author = {BAKR, Eslam Mohamed and Zhao, Liangbing and Hu, Vincent Tao and Cord, Matthieu and Perez, Patrick and Elhoseiny, Mohamed},
  year = {2025},
  booktitle = {ICLR},
  repostar = {toddlerdiffusion/code}
}

DepthFM: Fast Monocular Depth Estimation with Flow Matching

Ming Gui , Johannes S. Fischer , Ulrich Prestel , Pingchuan Ma , Dmytro Kotovenko , Olga Grebenkova , Stefan A. Baumann , Vincent Tao Hu , and Björn Ommer

In AAAI , 2025

An exploration of flow matching for blazing fast and zero-shot depth estimation

Bib PDF Code Website Oral

@inproceedings{depthfm,
  title = {DepthFM: Fast Monocular Depth Estimation with Flow Matching},
  author = {Gui, Ming and Fischer, Johannes S. and Prestel, Ulrich and Ma, Pingchuan and Kotovenko, Dmytro and Grebenkova, Olga and Baumann, Stefan A. and Hu, Vincent Tao and Ommer, Björn},
  year = {2025},
  note = {An exploration of flow matching for blazing fast and zero-shot depth estimation},
  booktitle = {AAAI},
  highlight = {Oral},
  repostar = {CompVis/depth-fm}
}

Does VLM Classification Benefit from LLM Description Semantics?

Pingchuan Ma , Lennart Rietdorf , Dmytro Kotovenko , Vincent Tao Hu , and Björn Ommer

In AAAI , 2025

PDF Code Website Oral at Workshop

Distillation of Diffusion Features for Semantic Correspondence

Frank Fundel , Johannes Schusterbauer , Vincent Tao Hu , and Björn Ommer

In WACV , 2025

Bib PDF Code Website

@inproceedings{fundel2025distilldift,
  title = {Distillation of Diffusion Features for Semantic Correspondence},
  author = {Fundel, Frank and Schusterbauer, Johannes and Hu, Vincent Tao and Ommer, Björn},
  year = {2025},
  note = {},
  booktitle = {WACV},
  repostar = {CompVis/DistillDIFT}
}

Scaling Image Tokenizers with Grouped Spherical Quantization

Jiangtao Wang , Zhen Qin , Yifan Zhang , Vincent Tao Hu , Björn Ommer , Rania Briq , and Stefan Kesselheim

2024

Bib PDF Code

@article{wang2024scaling,
  title = {Scaling Image Tokenizers with Grouped Spherical Quantization},
  author = {Wang, Jiangtao and Qin, Zhen and Zhang, Yifan and Hu, Vincent Tao and Ommer, Bj{\"o}rn and Briq, Rania and Kesselheim, Stefan},
  booktitle = {Arxiv},
  year = {2024},
  note = {},
  repostar = {HelmholtzAI-FZJ/flex_gen}
}

Diffusion Models and Representation Learning: A Survey

Michael Fuest , Pingchuan Ma , Ming Gui , Johannes Fischer , Vincent Tao Hu , and Bjorn Ommer

In Arxiv , 2024

The interplay between diffusion models and representation learning

Bib PDF

@inproceedings{diffusion_rl_survey,
  title = {Diffusion Models and Representation Learning: A Survey},
  author = {Fuest, Michael and Ma, Pingchuan and Gui, Ming and Fischer, Johannes and Hu, Vincent Tao and Ommer, Bjorn},
  year = {2024},
  note = {The interplay between diffusion models and representation learning},
  booktitle = {Arxiv},
  repostar = {dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy}
}

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Vincent Tao Hu , Stefan Andreas Baumann , Ming Gui , Olga Grebenkova , Pingchuan Ma , Johannes Fischer , and Bjorn Ommer

In ECCV , 2024

a DiT-style Mamba-based diffusion models

Bib PDF Code Website

@inproceedings{hu2024zigmaa,
  title = {ZigMa: A DiT-style Zigzag Mamba Diffusion Model},
  author = {Hu, Vincent Tao and Baumann, Stefan Andreas and Gui, Ming and Grebenkova, Olga and Ma, Pingchuan and Fischer, Johannes and Ommer, Bjorn},
  year = {2024},
  note = {a DiT-style Mamba-based diffusion models },
  booktitle = {ECCV},
  repostar = {CompVis/zigma}
}

Boosting Latent Diffusion with Flow Matching

Johannes S. Fischer , Ming Gui , Pingchuan Ma , Nick Stracke , Stefan A. Baumann , Vincent Tao Hu , and Bjorn Ommer

In ECCV , 2024

flow matching for super-resolution

Bib PDF Code Website Oral

@inproceedings{fischer2023boosting,
  title = {Boosting Latent Diffusion with Flow Matching},
  author = {Fischer, Johannes S. and Gui, Ming and Ma, Pingchuan and Stracke, Nick and Baumann, Stefan A. and Hu, Vincent Tao and Ommer, Bjorn},
  year = {2024},
  note = {flow matching for super-resolution},
  booktitle = {ECCV},
  highlight = {Oral},
  repostar = {CompVis/fm-boosting}
}

Guided Flow Vision Transformer from Self-Supervised Diffusion Features

Vincent Tao Hu , Yunlu Chen , Mathilde Caron , Yuki M. Asano , Cees G.M. Snoek , and Björn Ommer

In Arxiv , 2024

Bib PDF Code Website

@inproceedings{sgfm,
  title = {Guided Flow Vision Transformer from Self-Supervised Diffusion Features},
  author = {Hu, Vincent Tao and Chen, Yunlu and Caron, Mathilde and Asano, Yuki M. and Snoek, Cees G.M. and Ommer, Björn},
  year = {2024},
  note = {},
  booktitle = {Arxiv},
  repostar = {dongzhuoyao/sgfm}
}

Motion Flow Matching for Human Motion Synthesis and Editing

Vincent Tao Hu , Wenzhe Yin , Pingchuan Ma , Yunlu Chen , Basura Fernando , Yuki M. Asano , Efstratios Gavves , Pascal Mettes , Björn Ommer , and Cees G.M. Snoek

In Arxiv , 2024

Bib PDF Code Website

@inproceedings{motionfm,
  title = {Motion Flow Matching for Human Motion Synthesis and Editing},
  author = {Hu, Vincent Tao and Yin, Wenzhe and Ma, Pingchuan and Chen, Yunlu and Fernando, Basura and Asano, Yuki M. and Gavves, Efstratios and Mettes, Pascal and Ommer, Björn and Snoek, Cees G.M.},
  year = {2024},
  note1 = {},
  booktitle = {Arxiv},
  repostar = {dongzhuoyao/motionfm}
}

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

Divin Yan , Lu Qi , Vincent Tao Hu , Ming-Hsuan Yang , and Meng Tang

In arxiv , 2024

Bib PDF

@inproceedings{constrastivediffusion,
  title = {Training Class-Imbalanced Diffusion Model Via Overlap Optimization},
  author = {Yan, Divin and Qi, Lu and Hu, Vincent Tao and Yang, Ming-Hsuan and Tang, Meng},
  year = {2024},
  note = {},
  booktitle = {arxiv},
}

Latent Space Editing in Transformer-based Flow Matching

Vincent Tao Hu , David W Zhang , Pascal Mettes , Meng Tang , Deli Zhao , and Cees G.M. Snoek

In AAAI 2024. Also appear in ICML 2023 Workshop, New Frontiers in Learning, Control, and Dynamical Systems , 2024

Bib PDF Code Poster Website

@inproceedings{hulfm,
  title = {Latent Space Editing in Transformer-based Flow Matching},
  author = {Hu, Vincent Tao and Zhang, David W and Mettes, Pascal and Tang, Meng and Zhao, Deli and Snoek, Cees G.M.},
  year = {2024},
  note = {},
  booktitle = {AAAI 2024. Also appear in ICML 2023 Workshop, New Frontiers in Learning, Control, and Dynamical Systems},
  repostar = {dongzhuoyao/uspace}
}

Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation

Jacob Schnell , Jieke Wang , Lu Qi , Vincent Tao Hu , and Meng Tang

In SyntaGen CVPR workshop , 2024

Explore diffusion model for data augmention in segmentation task.

Bib PDF

@inproceedings{hu_fsinr,
  title = {Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation},
  author = {Schnell, Jacob and Wang, Jieke and Qi, Lu and Hu, Vincent Tao and Tang, Meng},
  year = {2024},
  note = {Explore diffusion model for data augmention in segmentation task.},
  booktitle = {SyntaGen CVPR workshop},
  repostar = {mengtang-lab/scribblegen}
}

Flow Matching for Conditional Text Generation in a Few Sampling Steps

Vincent Tao Hu , Di Wu , Yuki M. Asano , Pascal Mettes , Basura Fernando , Björn Ommer , and Cees G.M. Snoek

In EACL , 2024

Flow Matching for text generation

Bib PDF Code Website

@inproceedings{huflowseq,
  title = {Flow Matching for Conditional Text Generation in a Few Sampling Steps},
  author = {Hu, Vincent Tao and Wu, Di and Asano, Yuki M. and Mettes, Pascal and Fernando, Basura and Ommer, Björn and Snoek, Cees G.M.},
  year = {2024},
  booktitle = {EACL},
  note = {Flow Matching for text generation},
  selectedddd = {true},
  repostar = {dongzhuoyao/flowseq}
}

Self-Guided Diffusion Models

Tao Hu* , David W Zhang* , Yuki M. Asano , Gertjan J. Burghouts , and Cees G.M. Snoek

In CVPR , 2023

A bridge between the community of self-supervised learning and diffusion models. Short version to appear in NeurIPS 2022 Workshop on Score-Based Methods and NeurIPS 2022 Workshop Self-Supervised Learning Theory and Practice.

Bib PDF Code Website

@inproceedings{hu2022selfguided,
  title = {Self-Guided Diffusion Models},
  author = {Hu*, Tao and Zhang*, David W and Asano, Yuki M. and Burghouts, Gertjan J. and Snoek, Cees G.M.},
  year = {2023},
  booktitle = {CVPR},
  note = {A bridge between the community of self-supervised learning and diffusion models. Short version to appear in NeurIPS 2022 Workshop on Score-Based Methods and
                   NeurIPS 2022 Workshop Self-Supervised Learning Theory and Practice.},
  repostar = {dongzhuoyao/self-guided-diffusion-models}
}