Tao Hu

Ommer Lab.

prof_pic.jpg

Computer Vision & Learning Group

Akademiestr. 7,Munich

Ludwig Maximilian University of Munich

I am a Postdoctoral Research Fellow with Björn Ommer in Ommer-Lab ( Stable Diffusion Lab ), focused on exploring the scalability and generalization ablity of diffusion model in the nxtaim project. I finished my PhD at VISLab, University of Amsterdam.

I am recruiting for Bachelor, Master and PhD supervision in Munich and globally. If you're interested in collaborating, feel free to send an email.

Open to discussion and collaboration, feel free to send an email.

Focused on introducing inductive bias into neural network to achieve data-efficiency by few-shot learning, generative model, etc. Have a conviction that generative modelling will be the future of discriminative modelling.

Publication | GitHub |
LinkedIn
| Research Note | Chat with me |

news

Jan 23, 2025 ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge accepted by ICLR 2025.
Dec 20, 2024 Co-organize the 2nd Workshop on Efficient Large Vision Models, to CVPR 2025 Workshop
Dec 10, 2024 [MASK] is All You Need on Arxiv. Two papers including DepthFM accepted by AAAI 2025.
Dec 08, 2024 Distillation of Diffusion Features for Semantic Correspondence got accepted by WACV 2025. Scaling Image Tokenizers with Grouped Spherical Quantization on Arxiv.
Dec 06, 2024 NeurIPS 2024 Excellent Reviewer.

selected publications

  1. ./mask.jpg
    MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation
    Michael Fuest , Vincent Tao Hu , and Björn Ommer
    In Arxiv , 2025
  2. ./mask.jpg
    [MASK] is All You Need
    Vincent Tao Hu , and Björn Ommer
    In Arxiv , 2024
  3. ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
    Eslam Mohamed BAKR , Liangbing Zhao , Vincent Tao Hu , Matthieu Cord , Patrick Perez , and Mohamed Elhoseiny
    In ICLR , 2025
  4. DepthFM: Fast Monocular Depth Estimation with Flow Matching
    Ming Gui , Johannes S. Fischer , Ulrich Prestel , Pingchuan Ma , Dmytro Kotovenko , Olga Grebenkova , Stefan A. Baumann , Vincent Tao Hu , and Björn Ommer
    In AAAI , 2025
    An exploration of flow matching for blazing fast and zero-shot depth estimation
  5. Does VLM Classification Benefit from LLM Description Semantics?
    Pingchuan Ma , Lennart Rietdorf , Dmytro Kotovenko , Vincent Tao Hu , and Björn Ommer
    In AAAI , 2025
  6. TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
    Felix Krause , Timy Phan , Vincent Tao Hu , and Björn Ommer
    In arxiv , 2025
  7. Distillation of Diffusion Features for Semantic Correspondence
    Frank Fundel , Johannes Schusterbauer , Vincent Tao Hu , and Björn Ommer
    In WACV , 2025
  8. Scaling Image Tokenizers with Grouped Spherical Quantization
    Jiangtao Wang , Zhen Qin , Yifan Zhang , Vincent Tao Hu , Björn Ommer , Rania Briq , and Stefan Kesselheim
    2024
  9. Diffusion Models and Representation Learning: A Survey
    Michael Fuest , Pingchuan Ma , Ming Gui , Johannes Fischer , Vincent Tao Hu , and Bjorn Ommer
    In Arxiv , 2024
    The interplay between diffusion models and representation learning
  10. ZigMa: A DiT-style Zigzag Mamba Diffusion Model
    Vincent Tao Hu , Stefan Andreas Baumann , Ming Gui , Olga Grebenkova , Pingchuan Ma , Johannes Fischer , and Bjorn Ommer
    In ECCV , 2024
    a DiT-style Mamba-based diffusion models
  11. Boosting Latent Diffusion with Flow Matching
    Johannes S. Fischer , Ming Gui , Pingchuan Ma , Nick Stracke , Stefan A. Baumann , Vincent Tao Hu , and Bjorn Ommer
    In ECCV , 2024
    flow matching for super-resolution
  12. Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
    Stefan Andreas Baumann , Felix Krause , Michael Neumayr , Nick Stracke , Vincent Tao Hu , and Björn Ommer
    In Arxiv , 2024
    Prompt Editing in T2I models
  13. Guided Flow Vision Transformer from Self-Supervised Diffusion Features
    Vincent Tao Hu , Yunlu Chen , Mathilde Caron , Yuki M. Asano , Cees G.M. Snoek , and Björn Ommer
    In Arxiv , 2024
  14. Motion Flow Matching for Human Motion Synthesis and Editing
    Vincent Tao Hu , Wenzhe Yin , Pingchuan Ma , Yunlu Chen , Basura Fernando , Yuki M. Asano , Efstratios Gavves , Pascal Mettes , Björn Ommer , and Cees G.M. Snoek
    In Arxiv , 2024
  15. Training Class-Imbalanced Diffusion Model Via Overlap Optimization
    Divin Yan , Lu Qi , Vincent Tao Hu , Ming-Hsuan Yang , and Meng Tang
    In arxiv , 2024
  16. ./fm.png
    Latent Space Editing in Transformer-based Flow Matching
    Vincent Tao Hu , David W Zhang , Pascal Mettes , Meng Tang , Deli Zhao , and Cees G.M. Snoek
    In AAAI 2024. Also appear in ICML 2023 Workshop, New Frontiers in Learning, Control, and Dynamical Systems , 2024
  17. ./scribbleseg.png
    Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation
    Jacob Schnell , Jieke Wang , Lu Qi , Vincent Tao Hu , and Meng Tang
    In SyntaGen CVPR workshop , 2024
    Explore diffusion model for data augmention in segmentation task.
  18. ./fm-s2s.png
    Flow Matching for Conditional Text Generation in a Few Sampling Steps
    Vincent Tao Hu , Di Wu , Yuki M. Asano , Pascal Mettes , Basura Fernando , Björn Ommer , and Cees G.M. Snoek
    In EACL , 2024
    Flow Matching for text generation
  19. ./sgdm-why.png
    Self-Guided Diffusion Models
    Tao Hu* , David W Zhang* , Yuki M. Asano , Gertjan J. Burghouts , and Cees G.M. Snoek
    In CVPR , 2023
    A bridge between the community of self-supervised learning and diffusion models. Short version to appear in NeurIPS 2022 Workshop on Score-Based Methods and NeurIPS 2022 Workshop Self-Supervised Learning Theory and Practice.