Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP 2019)

- 3 mins

Abstract

figure1-and-2

3. Model

Pooling Strategy

  1. CLS: Output of CLS-token
  2. MEAN: Mean of all output vectors
  3. MAX: Max-over-time of the output vectors

Objective Functions

  1. Classification Objective Function

    \[o = \text{softmax}(W_t(u, v, |u-v|))\]
    • $W_t \in \mathbb{R}^{3n \times k}$
    • Optimize cross-entropy loss
    • dipicted in Figure 1
  2. Regression Objective Function

    \[o = \sigma(\text{cosine\_similarity}(u, v))\]
    • Cosine similarity between two sentence embedding $u$ and $v$
    • Optimize mean-squared-error loss
    • dipicted in Figure 2
  3. Triplet Objective Function

    • Anchor sentence $a$ , positive sentence $p$ , negative sentence $n$ 이 있다고 해보자
    • Triplet loss는 $a$ 와 $p$ 사이의 거리는 가깝게, $a$ 와 $n$ 사이의 거리는 멀게 해줌
    • 아래와 같은 loss function을 minimize함
    \[\max{(||s_a - s_p|| - ||s_a - s_n|| + \epsilon, 0)}\]
    • Distance metric으로는 Euclidean을 사용
    • $\epsilon$ 은 1로 세팅

3.1. Training Details

4. Evaluation - Semantic Textual similarity

4.1. Unsupervised STS

table1

4.2. Supervised STS

table2

5. Evaluation - SentEval

table5

6. Ablation Study

table6

Joohong Lee

Joohong Lee

Machine Learning Researcher

rss facebook twitter github youtube mail spotify instagram linkedin google pinterest medium