U2I 调研

所以主要是要学的一个更好的分布,分布更加均匀,更加散。同时在长尾的例子上希望能学的更好 [1]

  1. 温度超参替代hard neg [2]。
  2. 提前做特征交互 [3]
  3. 引入对比学习。Dropout twice [4]
  4. 引入GNN然后做对比学习这个就不太现实

负样本[5]

  • The first insight is about hard selection strategy. We found that using the hardest examples is not the best strategy. We compared sampling from different rank positions and found sampling between rank 101-500 achieved the best model recall. [6]

  • 利用曝光未点击样本:https://arxiv.org/pdf/1909.00385.pdf
  • Debias: inverse propensity weighting
    • https://arxiv.org/pdf/2005.12964.pdf
      • use memory bank
      • 很好玩的事情是,这个系统同样也是单独建模了item在上个系统中的物品推荐概率,但是他证明了对比学习的公式3与这个是等价的,同时系统更加稳定。
      • . Sampled softmax in general outperforms other approximations such as NCE [18] and negative sampling [38] when the vocabulary is large
      • intent vector是真的好用欸
    • Causal inference in statistics, social, and biomedical sciences.
    • The central role of the propensity score in observational studies for causal effects
    • Sampling
  • Privileged Features Distillation at Taobao Recommendations:用精排模型蒸馏,其实和分数对齐也差不多吧,但是细节还是可以看一下怎么做的。

参考文献:

  1. 对比学习视角:重新审视推荐系统的召回粗排模型 

  2. Understanding the Behaviour of Contrastive Loss 

  3. FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction 

  4. Self-supervised Learning for Large-scale Item Recommendations 

  5. 负样本为王:评Facebook的向量化召回算法 

  6. Embedding-based Retrieval in Facebook Search 

Written on July 17, 2023