ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

CompVis @ LMU Munich, MCML
ECCV 2024
Oral Talk in ICML 2024 Workshop on Long Context Foundation Models (LCFM)

We present ZigMa, a scanning scheme that follows a zigzag pattern, considering both spatial continuity and parameter efficiency. We further adapt this scheme to video, separating the reasoning between spatial and temporal dimensions, thus achieving efficient parameter utilization. Our design allows for greater incorporation of inductive bias for non-1D data and improves parameter efficiency in diffusion models.

Want to learn more about ZigMa?

Check out our paper and code!

Acknowledgements

We would like to thank Timy Phan, Yunlu Chen for the extensive proofreading. This project has been supported by the German Federal Ministry for Economic Affairs and Climate Action within the project “NXT GEN AI METHODS – Generative Methoden für Perzeption, Prädiktion und Planung”, the bidt project KLIMA-MEMES, Bayer AG, and the German Research Foundation (DFG) project 421703927. The authors gratefully acknowledge the Gauss Center for Supercomputing for providing compute through the NIC on JUWELS at JSC and the HPC resources supplied by the Erlangen National High Performance Computing Center (NHR@FAU funded by DFG).

BibTeX


        @InProceedings{hu2024zigma,
              title={ZigMa: A DiT-style Zigzag Mamba Diffusion Model},
              author={Vincent Tao Hu and Stefan Andreas Baumann and Ming Gui and Olga Grebenkova and Pingchuan Ma and Johannes Fischer and Björn Ommer},
              booktitle = {Arxiv},
              year={2024}
        }