PPT: Optimal Transport for Machine Learning

An important paper: Marco Cuturi: Sinkhorn Distances:Lightspeed Computation of Optimal Transport, NeurIPS

OTK:An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention,Arxiv2006

code

Wasserstein metric

EMD(Earth Moving Distance)

pic from this paper

You need to know the cost per unit c, and the supply amount s and demand amount d in advance.

EMD in cloud point starts from this paper

What is c,s,d in Cloud Point? it seems that supply and demand is not a must in this optimal transport problem formulation, check powerpoint here.

It’s quite similart to chamfer distance. While chamfer distance is not a bijection, EMD is a bijection.

a auction algorithm-based o(n) approximation solution implementation of OT can be found in this paper.

Applications

Cloud Point metrics. check Haoqiang Fan,CVPR

Deep metric learning: Learning with Batch-wise Optimal Transport Loss for 3D Shape Recognition,CVPR19

show how to learn an importance-driven distance metric via optimal transport programming from batches of samples.It can automatically emphasize hard examples and lead to significant improvements in convergence. We propose a new batch-wise optimal transport loss and combine it in an end-to-end deep metric learning manner. We use it to learn the distance metric and deep feature representation jointly for recognition. Empirical results on visual retrieval and classification tasks with six benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and ModelNet40, demonstrate the superiority of the proposed method. It can accelerate the convergence rate significantly while achieving a state-of-the-art recognition performance. For example, in 3D shape recognition experiments, we show that our method can achieve better recognition performance within only 5 epochs than what can be obtained by mainstream 3D shape recognition approaches after 200 epochs.

It can be viewed as an n-pairs extension version of the contrastive loss or triplet loss.

Few-shot learning: DeepEMD: Differentiable Earth Mover’s Distance for Few-Shot Learning,CVPR20

feature point matching in 3D-reconstruction:SuperGlue: Learning Feature Matching with Graph Neural Networks,CVPR20,oral

  • use Sinkhorn algorithm to solve the appriximation of optimal transport.
  • cost is obtained by an attention-based graph neural network. Those costs will be used in optimal transport. Check Figure 3.
  • also works directly in pure sift feature, check table 2.

Unsupervised learning: Self-labelling via simultaneous clustering and representation learning,ICLR20