publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. placeholder
    Guiding Token-Sparse Diffusion Models
    Felix Krause, Stefan Andreas Baumann, Johannes Schusterbauer, and 4 more authors
    2026
  2. placeholder
    Purrception: Variational Flow Matching for Vector-Quantized Image Generation
    Răzvan-Andrei Matișan, Vincent Tao Hu, Grigory Bartosh, and 6 more authors
    In ICLR, 2026
  3. placeholder
    Diffusion Models and Representation Learning: A Survey
    Michael Fuest, Pingchuan Ma, Ming Gui, and 3 more authors
    In T-PAMI, 2026
    The interplay between diffusion models and representation learning

2025

  1. placeholder
    TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
    Felix Krause, Timy Phan, Ming Gui, and 3 more authors
    In ICCV, 2025
  2. placeholder
    Stochastic Interpolants for Revealing Stylistic Flows across the History of Art
    Pingchuan Ma, Ming Gui, Johannes Schusterbauer, and 4 more authors
    In ICCV, 2025
  3. placeholder
    Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
    Stefan Andreas Baumann, Felix Krause, Michael Neumayr, and 3 more authors
    In CVPR, 2025
    Prompt Editing in T2I models
  4. placeholder
    MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation
    Michael Fuest, Vincent Tao Hu, and Björn Ommer
    In Arxiv, 2025
  5. placeholder
    ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
    Eslam Mohamed BAKR, Liangbing Zhao, Vincent Tao Hu, and 3 more authors
    In ICLR, 2025
  6. placeholder
    DepthFM: Fast Monocular Depth Estimation with Flow Matching
    Ming Gui, Johannes S. Fischer, Ulrich Prestel, and 6 more authors
    In AAAI, 2025
    An exploration of flow matching for blazing fast and zero-shot depth estimation
  7. placeholder
    Does VLM Classification Benefit from LLM Description Semantics?
    Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, and 2 more authors
    In AAAI, 2025
  8. placeholder
    Distillation of Diffusion Features for Semantic Correspondence
    Frank Fundel, Johannes Schusterbauer, Vincent Tao Hu, and 1 more author
    In WACV, 2025

2024

  1. ./mask.jpg
    [MASK] is All You Need
    Vincent Tao Hu and Björn Ommer
    In Arxiv, 2024
  2. placeholder
    Scaling Image Tokenizers with Grouped Spherical Quantization
    Jiangtao Wang, Zhen Qin, Yifan Zhang, and 4 more authors
    2024
  3. placeholder
    ZigMa: A DiT-style Zigzag Mamba Diffusion Model
    Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, and 4 more authors
    In ECCV, 2024
    a DiT-style Mamba-based diffusion models
  4. placeholder
    Boosting Latent Diffusion with Flow Matching
    Johannes S. Fischer, Ming Gui, Pingchuan Ma, and 4 more authors
    In ECCV, 2024
    flow matching for super-resolution
  5. placeholder
    Guided Flow Vision Transformer from Self-Supervised Diffusion Features
    Vincent Tao Hu, Yunlu Chen, Mathilde Caron, and 3 more authors
    In Arxiv, 2024
  6. placeholder
    Motion Flow Matching for Human Motion Synthesis and Editing
    Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, and 7 more authors
    In Arxiv, 2024
  7. placeholder
    Training Class-Imbalanced Diffusion Model Via Overlap Optimization
    Divin Yan, Lu Qi, Vincent Tao Hu, and 2 more authors
    In arxiv, 2024
  8. ./fm.png
    Latent Space Editing in Transformer-based Flow Matching
    Vincent Tao Hu, David W Zhang, Pascal Mettes, and 3 more authors
    In AAAI 2024. Also appear in ICML 2023 Workshop, New Frontiers in Learning, Control, and Dynamical Systems, 2024
  9. ./fm-s2s.png
    Flow Matching for Conditional Text Generation in a Few Sampling Steps
    Vincent Tao Hu, Di Wu, Yuki M. Asano, and 4 more authors
    In EACL, 2024
    Flow Matching for text generation

2023

  1. ./sgdm-why.png
    Self-Guided Diffusion Models
    Tao Hu*, David W Zhang*, Yuki M. Asano, and 2 more authors
    In CVPR, 2023
    A bridge between the community of self-supervised learning and diffusion models. Short version to appear in NeurIPS 2022 Workshop on Score-Based Methods and NeurIPS 2022 Workshop Self-Supervised Learning Theory and Practice.

2021

  1. ./video_retrieval.png
    Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
    Martine Toering, Ioannis Gatopoulos, Maarten Stol, and 1 more author
    In WACV, 2021
    Improve video representation by constrasting Prototypical features.

2020

  1. ./focal.png
    Localizing the Common Action Among a Few Videos
    Pengwan Yang*, Tao Hu*, Pascal Mettes, and 1 more author
    In European Conference on Computer Vision(ECCV), 2020
    Localizing the temporal extent of an action in a long untrimmed video by attention techniques.
  2. ./pointmixup.gif
    Pointmixup: Augmentation for point clouds
    Yunlu Chen*, Tao Hu*, Efstratios Gavves, and 4 more authors
    In European Conference on Computer Vision(ECCV), 2020
    A simple augmentation method based on MixUp to boost the performance on related tasks of point cloud.

2019

  1. ./avatar_amcg.png
    Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation
    Tao Hu, Pengwan Yang, Chiliang Zhang, and 3 more authors
    In AAAI, 2019
    Solve the few-shot segmentation problem by applying attention in multi-scales.
  2. ./avatar_silco.png
    SILCO: Show a Few Images, Localize the Common Object
    Tao Hu, Pascal Mettes, Jia-Hong Huang, and 1 more author
    In International Conference on Computer Vision(ICCV), 2019
    Design a graph network and apply attention on them to solve the problem of common object localization.