We present ZigMa, a scanning scheme that follows a zigzag pattern, considering both spatial continuity and parameter efficiency. We further adapt this scheme to video, separating the reasoning between spatial and temporal dimensions, thus achieving efficient parameter utilization. Our design allows for greater incorporation of inductive bias for non-1D data and improves parameter efficiency in diffusion models.

Want to learn more about ZigMa?

Check out our paper and code!

BibTeX


        @InProceedings{hu2024zigma,
              title={ZigMa: A DiT-style Zigzag Mamba Diffusion Model},
              author={Vincent Tao Hu and Stefan Andreas Baumann and Ming Gui and Olga Grebenkova and Pingchuan Ma and Johannes Fischer and Björn Ommer},
              booktitle = {Arxiv},
              year={2024}
        }