Multimodal Contrastive Training for Visual Representation Learning

在这里插入图片描述
parameterize the image encoder as f

_{iq}

iq?
在这里插入图片描述
query feature q

_{ii}

ii?，key feature k

_{ii}

ii?
parameterize the textual encoder as

(

;

)

f_{cq}(·; Θ_q, Φ_{cq})

fcq?(?;Θq?,Φcq?)，momentum textual encoder as

(

;

)

f_{ck}(·; Θ_k, Φ_{ik})

fck?(?;Θk?,Φik?).

c^?_j

cj??和

c^star_j

cj??是different augmented examples
在这里插入图片描述

吐槽

第一张图字母下标被黑色背景盖住了，且作者不公布代码，不该是CVPR的“水平”