作者sssh (叫我松高魂 ~~)
看板DataScience
标题[问题] Kaggle-Humpback Whale Identification
时间Tue Jul 23 02:01:44 2019
由於自己是刚开始学习,所以会上 Kaggle 看一下人家分享的 Kernel 来试着跟着做看看
最近在看的是 Humpback Whale Identification
里面大多数都用 SNN 作为模型,
并且都以
http://bit.ly/2y0F1o7 为 pre-training model 下去做修正
这个 model 我看了很久,但在 head model 的部分有点困惑
( 基本上看不太懂他写什麽@@ )
作者认为原始 SNN 利用距离来计算 Loss 有几个问题 :
1. A distance measure will consider two features with value zero as a perfect
match, while two features with large but slightly different values will be
seen as good, but not quite as good since they are not exactly equal. Still,
I feel there is more postive signal in the active features than in the
negative ones, especially with ReLU (Rectified Linear Unit) activation,
concept that is lost by the distance measure.
2. Also, a distance measure does not provide for features to be negatively
correlated. Consider a case where, if both images have feature X, they must
be the same whale, unless they also both have feature Y, in which case X is
not as clear.
3. At the same time, there is this implicit assumption that swapping the two
images must produce the same result: if A is the same whale as B, B must be
the same whale as A.
基於以上三点理由,作者将 两个 branch model 输出的特徵向量x,y
变成四个向量 x+y,|x-y|,xy,(x-y)^2
然後再通过两个 卷积层
作者宣称
这样可以 learn how to weigh between matching zeros and close non-zero values
=========
我大概有几个问题 :
1. 原本以距离作为 loss 的方式,作者认为会有问题的其中第一跟第二项是什麽意思?
2. 为什麽要将两个 output vectors 做运算形成四个 vectors ?
又为什麽这样可以解决上述三个问题 ?
想请大家指点一下,在此先感谢大家~~
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 1.171.69.123 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/DataScience/M.1563818507.A.207.html
1F:推 sxy67230: 作者在comments 有举例子,你可以看一下,他举例同一个 07/23 16:10
2F:→ sxy67230: 人鼻子长疹子跟没有算不算同一人,但是我觉得这个距离计 07/23 16:10
3F:→ sxy67230: 算的公式也不是绝对的。 07/23 16:10
4F:推 sxy67230: 简单来说作者就是把他四个的距离公式去摊开,在做reshap 07/23 16:14
5F:→ sxy67230: e後,抽出主要的距离特徵,这样确实是有办法对次要特徵 07/23 16:14
6F:→ sxy67230: 做惩罚。 07/23 16:14