Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

In Defense of the Triplet Loss for Person Re-Identification

2017-07-02 14:04:20

This blog comes from: http://blog.csdn.net/shuzfan/article/details/70069822

Paper: https://arxiv.org/abs/1703.07737

Github: https://github.com/VisualComputingInstitute/triplet-reid

Introduction

Re-ID和图像检索有点类似。这样来看，Google的FaceNet利用Triplet Loss训练的高度嵌入的特征，似乎很适合做这样大范围的快速比对。但是，很多的研究文献表明常见的classification或者结合verification Loss比Triplet Loss似乎更适合这个任务。他们通常将CNN作为特征提取器，后面再接专门的测度模型。但是这两种Loss有着明显的缺点：

Classification Loss：当目标很大时，会严重增加网络参数，而训练结束后很多参数都会被摒弃。

Verification Loss：只能成对的判断两张图片的相似度，因此很难应用到目标聚类和检索上去。因为一对一对比太慢。

但是 Triplet Loss还是很吸引人啊：端到端，简单直接；自带聚类属性；特征高度嵌入。

为什么Triplet训不好呢或者说不好训呢？

首先需要了解，hard mining在Triplet训练中是一个很重要的步骤。没有hard mining会导致训练阻塞收敛结果不佳，选择过难的hard又会导致训练不稳定收敛变难。此外，hard mining也比较耗时而且也没有清楚的定义什么是 “Good Hard”。

文章的贡献主要有两个方面：

(1) 设计了新的Triplet Loss，并和其它变种进行了对比。

(2) 对于是否需要 pre-trained模型，进行了实验对比分析。

Triplet Loss

这一小节主要介绍几种Triplet 变种。

Large Margin Nearest Neighbor loss

比较早的Triplet形式(参考文献[1])。 \(L_{pull}\) 表示拉近属于同一目标的样本； \(L_{push}\) 表示拉远不同目标的样本。

由于是最近邻分类，所以同一类当中可能有多个cluster，而且固定的cluster中心也比较难以确定。

FaceNet Triplet Loss

Google的人脸认证模型FaceNet(参考文献[2]), 不要求同类目标都靠近某个点，只要同类距离大于不同类间距离就行。完美的契合人脸认证的思想。

Batch All Triplet Loss

FaceNet Triplet Loss训练时数据是按顺序排好的3个一组3个一组。假如batch_size=3B,那么实际上有多达 \(6B^2-4B\)种三元组组合，仅仅利用B组就很浪费。

所以我们可以首先改变一下数据的组织方式：\(batch\ size = K\times B\),即随机地选出K个人，每个人随机地选B张图片。这样总共会有 \(PK(PK-K)(K-1)\)种组合，计算loss时就按照下式统计所有可能。

Batch Hard Triplet Loss

Batch All Triplet Loss看起来一次可以处理非常多的三元组，但是有一点很尴尬：数据集非常大时训练将会非常耗时，同时随着训练深入很多三元组因为很容易被分对而变成了“无用的”三元组。

怎么办？ Hard Mining. 但是，过难得三元组又会导致训练不稳定，怎么办？ Mining Moderate Hard.

作者定义了下面的“较难”的Triplet Loss，之所以是“较难”，是因为只是在一个小的Batch里面选的最难的。

其中 \(x_j^i\) 表示第 \(i\) 个人的第 \(j\)张图片。

Lifted Embedding Loss

文献[3]针对3个一组3个一组排列的batch，提出了一种新的Loss：将anchor-positive pair之外的所有样本作为negative，然后优化Loss的平滑边界。

文章针对 \(batch\ size = K\times B\)的形式对上式稍作改进：

Distance Measure

很多相关工作中，都使用平方欧式距离 \(D(a,b) = |a-b|_2^2\) 作为度量函数。作者虽然没有系统对比过其它度量函数，但是在实验中发现非平方欧氏距离 \(D(a,b) = |a-b|_2\) 表现的更为稳定。同时，使用非平方欧氏距离使得margin 这个参数更具有可读性。

Soft-margin

之前的很多Triplet Loss都采用了截断处理，即如果Triplet三元组关系正确则Loss直接为0。作者发现，对于Re-ID来说，有必要不断地拉近同类目标的距离。因此，作者设计了下面的soft-margin函数：

\(s(x) = ln(1+e^x)\)

Experiments

多种Triplet Loss性能对比

(1) 没有Hard Mining的 \(L_{tri}\)往往模型效果不好，如果加上简单的offline hard-mining(OHM)，则效果很不稳定，有时候很好有时候完全崩掉。

(2) Batch Hard形式的 \(L_{BH}\)整体表现好于 Batch All形式的 \(L_{BA}\)。作者猜测，训练后期很多三元组loss都是0，然后平均处理时会把仅剩的有用的信息给稀释掉。为了证明该猜想，作者计算平均loss时只考虑那些不为0的，用 \(L_{BA\neq 0}\)表示，发现效果确实会变好。

(3) 在作者的Re-ID实验中，Batch Hard + soft-margin的效果最好，但是不能保证在其他任务中这种组合依然是最好的，这需要更多的实验验证。

To Pretrain or not to Pretrain?

TriNet表示来自pre-trained model，LuNet是作者自己设计的一个普通网络。

从上面的表格来看，利用pre-trained model确实可以获得更好一点的效果，但是从头开始训练的网络也不会太差。

特别的，pre-trained model往往体积较大模式固定，不如自己设计网络来的灵活。同时，pre-trained model往往有其自己的固定输入，我们如果修改其输入很可能会得到相反的效果。如下表：

Trick

(1) 没有必要对输出特征进行归一化；

(2) 如果使用了hard mining, 单纯的看loss变化往往不能正确把握训练的进程。作者推荐观察一个batch中的有效三元组个数，或者所有pair间的距离。

(3) 初始margin不宜过大；

参考文献

[1] K. Q. Weinberger and L. K. Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification. JMLR,10:207–244, 2009

[2] F. Schroff, D. Kalenichenko, and J. Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. In CVPR, 2015

[3] H. O. Song, Y. Xiang, S. Jegelka, and S. Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. In CVPR, 2016

时间： 2024-09-29 09:33:21

Paper Reading: In Defense of the Triplet Loss for Person Re-Identification的相关文章

Re-ID with Triplet Loss

一篇讲Person Re-ID的论文,与人脸识别(认证)有非常多相通的地方. 文章链接: <In Defense of the Triplet Loss for Person Re-Identification> Github链接:https://github.com/VisualComputingInstitute/triplet-reid 眼下还没有放出代码,作者说等论文录用了就放出来. Introduction Triplet Loss Large Margin Nearest Neig

triplet loss

因为待遇低,因为工作不开心,已经严重影响了自己的工作积极性和工作效率,这几天发觉这样对自己实在是一种损失,决定提高工作效率,减少工作时间. 说说最近做的tracking, multi-object tracking. object tracking首先要有object才能tracking是吧,而学术上研究的大多数single object tracking,其实就是单目标跟踪,就是开始你画个区域,告诉算法你要跟踪的是那个东西,然后接下来的视频里,把这个东西框出来.而实际应用的多是multi-ob

Tutorial: Triplet Loss Layer Design for CNN

Tutorial: Triplet Loss Layer Design for CNN Xiao Wang Recently, I meet a

基于Triplet loss函数训练人脸识别深度网络（Open Face）

Git: http://cmusatyalab.github.io/openface/ FaceNet's innovation comes from four distinct factors: (a) thetriplet loss, (b) their triplet selection procedure, (c) training with 100 million to 200 million labeled images, and (d) (not discussed here)

Paper Reading: Stereo DSO

开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras Abstract Optimization objectives: intrinsic/extrinsic parameters of all keyframes all selected pixels' depth Inte

Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11 19:47:46 CVPR 2017 This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the followin

【Paper Reading】Improved Textured Networks: Maximizing quality and diversity in Feed-Forward Stylization and Texture Synthesis

Improved Textured Networks: Maximizing quality and diversity in Feed-Forward Stylization and Texture Synthesis https://arxiv.org/abs/1701.02096v1 本文最主要的贡献有两点: 1. 引入instance normalization 代替 batch normalization 2. 通过使得生产器从Julesz ensemble无偏采样来增加texture

【Paper Reading】Object Recognition from Scale-Invariant Features

Paper: Object Recognition from Scale-Invariant Features Sorce: http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf SIFT 即Scale Invariant Feature Transfrom, 尺度不变变换,由David Lowe提出.是CV最著名也最常用的特征.在图像目标识别的应用中,常常要求图像的特征有很好的roboust即不容易受到平移,旋转,尺度缩放,光照,仿射的英雄.SIFT算子具有

Paper Reading: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual TrackingECCV 2016 The key point of KCF is the ability to efficiently exploit available negative data by including all shifted versions of a training sample, in anthor w